Browse thread
[1/2 OT] Indexing (and mergeable Index-algorithms)
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2005-11-18 (01:50) |
From: | skaller <skaller@u...> |
Subject: | Re: [Caml-list] [1/2 OT] Indexing (and mergeable Index-algorithms) |
On Thu, 2005-11-17 at 16:15 -0600, Brian Hurt wrote: > > This is the worst possible case- that each block is half full. Which > means that instead of log_k(N) blocks, you're having to touch log_{k/2}(N) > blocks. This means that if N=2^32 and k=256, that you need to read 5 > blocks instead of 4 (128^5 = 2^35). And the number of blocks you need has > about doubled. Also note that the binary search per block is now cheaper > (by one step), and the cost of inserting elements is half. > > So the question becomes: is the performance advantage gained by > rebalancing worth the cost? Yes, that's the question. And there is no single answer :) Note, it is not 5 reads instead of 4, it is 3 reads instead of 2 (assuming the first two levels are cached). A BTree system I used once was fixed at 3 levels. So it could be kind of critical :) > If I was worried about it, I'd be inclined to be more agressive on merging > and splitting nodes. Basically, if the node is under 5/8th full, I'd look > to steal some children from siblings. If the node is over 7/8th full, I'd > look to share some child with siblings. Note that if you have three nodes > each 1/2 full, you can combine the three into two nodes, each 3/4th full. > You want to keep nodes about 3/4th full, as that makes it cheaper to add > and delete elements. Yup. There are lots of possible tweaks :) > Two problems with this: first, what happens when the sibling is full too, > you can get into a case where an insert is O(N) cost, and second, this is > assuming inserts only (I can still get to worst-case with deletes). Depends precisely on the algorithm -- mine only looked once. If the sibling was full, you just split as usual. Its a cheap hack :) -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net