Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[1/2 OT] Indexing (and mergeable Index-algorithms)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2005-11-18 (01:50)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] [1/2 OT] Indexing (and mergeable Index-algorithms)
On Thu, 2005-11-17 at 16:15 -0600, Brian Hurt wrote:

> This is the worst possible case- that each block is half full.  Which 
> means that instead of log_k(N) blocks, you're having to touch log_{k/2}(N) 
> blocks.  This means that if N=2^32 and k=256, that you need to read 5 
> blocks instead of 4 (128^5 = 2^35).  And the number of blocks you need has 
> about doubled.  Also note that the binary search per block is now cheaper 
> (by one step), and the cost of inserting elements is half.
> So the question becomes: is the performance advantage gained by 
> rebalancing worth the cost?

Yes, that's the question. And there is no single answer :)

Note, it is not 5 reads instead of 4, it is 3 reads instead of 2
(assuming the first two levels are cached).

A BTree system I used once was fixed at 3 levels. So it could
be kind of critical :)

> If I was worried about it, I'd be inclined to be more agressive on merging 
> and splitting nodes.  Basically, if the node is under 5/8th full, I'd look 
> to steal some children from siblings.  If the node is over 7/8th full, I'd 
> look to share some child with siblings.  Note that if you have three nodes 
> each 1/2 full, you can combine the three into two nodes, each 3/4th full. 
> You want to keep nodes about 3/4th full, as that makes it cheaper to add 
> and delete elements.

Yup. There are lots of possible tweaks :)

> Two problems with this: first, what happens when the sibling is full too, 
> you can get into a case where an insert is O(N) cost, and second, this is 
> assuming inserts only (I can still get to worst-case with deletes).

Depends precisely on the algorithm -- mine only looked once.
If the sibling was full, you just split as usual. Its a cheap 
hack :)

John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: