Version française
Home     About     Download     Resources     Contact us    
Browse thread
strange timing between search trees
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Christophe Raffalli <christophe.raffalli@u...>
Subject: Re: [Caml-list] strange timing between search trees
Jon Harrop a écrit :
> On Friday 25 January 2008 01:32:49 Christophe Raffalli wrote:
>   
>> Dear list members,
>>
>> I wanted to compare 2-3-4 trees (look in wikipedia if you do not know
>> them) with the balanced trees of the standard library.
>> I expected the 2-3-4 to be much faster for search because they use much
>> less indirections. However, I thought
>> that construction should make little difference ...
>>
>> I was wrong :
>> Construction is arround 20% faster for 2-3-4 trees
>> Search is slower arround 5-10% for 2-3-4 trees (the diff gets smaller
>> when the trees are larger which is expected)
>>
>> I wonder if the difference in code size is the explanation (the search
>> function for balanced trees is really small and fits better
>> in cache) ?
>>     
>
> I doubt it. You can make the built-in Set module much faster by increasing the 
> code size.
>
>   
>> I attach my code with two files (the code and the test, compile with
>> ocamlopt unix.cmxa set234.ml test234tree.ml)
>>
>> Any remarks or comments ?
>>     
>
> I get quantitatively similar performance results. You're missing a lot of 
> information in your benchmarking though:
>
> What happens if the sets are constructed in-order (affects locality)?
>
> What happens if you iterate the benchmark over an array of preallocated sets?
>
>   
I known, but I was quite surprised enough with these cases.

I now have an explanation: for binary balanced trees, at least one son 
of every node is near to its ancestor,
because you create one branch at each insertion (this is true if 
comparison do not allocate and if the GC do not break locality to much, but
I think copying GC will mostly keep this property if there is not too 
much sharing ?) ... Unfortunately,
making comparison allocates memory changed nothing to my timing so I am 
not sure my explanation is reasonnable ...
At least, this shows that there is no room for a lot of improvement by 
building larger nodes ...
> Ultimately, I found it much more productive to simply optimize the Set module 
> that comes with OCaml. I'll describe how in a later OCaml Journal article but 
> the main ideas are to get the comparison function inlined (OCaml doesn't 
> inline across functor boundaries) and add a new node type for leaves. The 
> latter greatly reduces the stress on the GC, which is the main performance 
> bottleneck of the current implementation, and I found it to be up to 30% 
> faster.
>
>   
Are you sure the leaf node is a gain ? I just tried that and it failed 
(slow down) on my code (not a benchmark, my real code).

Do you add one node for Leaf, or Three (one node when the two son are 
empty and two when one of the son is empty) ?
Do you make more subtil changes ?

And could you make your optimized set library available ?


Best regards,
Christophe


PS: yet another remark: filter and partition seem suboptimal in the set 
standard library ?

Why aren't they written with join and merge as in:

    let rec filter p = function
        | Empty -> Empty
        | Node(l, v, r, _) ->
        if p v then join (filter p l) v (filter p r)
        else merge (filter p l) (filter p r)

    let rec partition p = function
        | Empty -> (Empty, Empty)
        | Node(l, v, r, _) ->
        let (lt, lf) = partition p l in
        let (rt, rf) = partition p r in
        if p v then join lt v rt, merge lf rf else
          merge lt rt, join lf v rf