Version franaise
Home About Download Resources Contact us

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
The Implicit Accumulator: a design pattern using optional arguments
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-06-27 (20:35)
From: Brian Hurt <bhurt@j...>
Subject: Re: [Caml-list] Book about functional design patterns
Quc Peyrot wrote:

> On Jun 27, 2007, at 9:48 PM, Brian Hurt wrote:
>> In Ocaml, allocations are relatively cheap- a cost similiar to that  
>> of allocating on the stack.  Which is why when you tell long-time  
>> Ocaml programmers that you want to avoid an allocation cost by  
>> allocating on the stack, they tend to go "um, why?"
>> Mutable data structures have their cost as well.  When you assign a  
>> pointer into an object old enough to be in the major heap, Ocaml  
>> kicks off a minor collection.  For small N, this can often make O 
>> (log N) purely functional structures faster than their O(1)  
>> imperative counterparts.
>> No to mention the correctness advantages, plus other advantages.
> If I have a tree/map datastructure and I add an element to it, my  
> understanding it that, when building the new tree, all the node up to  
> the root are going to be replaced. Is my understanding correct?

No.  Only those elements that change need to be reallocated- the rest 
can be shared between the old and new tree.  So, assuming the tree is a 
balanced tree, only the log N or so nodes between the root and the 
changed node will need to be reallocated, of the N nodes in the tree- so 
you're sharing N - log N nodes between the two trees.

> Now let's say I want to build a tree with millions elements, and I'm  
> only interested in the final result, i.e. I don't need to be able to  
> rollback to a previous state or fancy stuff like that (I therefore  
> never keep a reference to the root of the old tree, each time I add a  
> new element).
> Is there a way of writing the building code (with a fold-type of  
> construct or something like that) such that the compiler will  
> understand it, and will generate a code equivalent to the imperative  
> code (i.e. we don't allocate new nodes up to the root each time we  
> insert an element)?
With a million element tree, you're only reallocating 20-30 nodes 
(assuming it's balanced), and sharing the other 999,970-999,980 nodes 
between the two trees.  Allocating each new node is only going to be a 
small number of clock cycles, like 5-10 clocks.  So the total allocation 
cost is only 100-200 clocks or so.

What kills performance, and makes hash tables higher performance at this 
point, is the balancing logic and the cache misses you're taking walking 
down the tree (the younger heap stays in cache).  But even there, we're 
not talking about huge performance differences- for all except the most 
time-critical code, tree based maps are generally fast enough.