Version française
Home     About     Download     Resources     Contact us    
Browse thread
Fwd: Re: [Caml-list] The boon of static type checking
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jon Harrop <jon@j...>
Subject: Fwd: Re: [Caml-list] The boon of static type checking
On Sunday 06 February 2005 22:26, Radu Grigore wrote:
> By "fork lineages" you mean keeping "old" versions of the map around?

Exactly, "persistence".

> > I would like to think that your stack will be big enough to fit 40
> > recursive calls if you expect your computer to handle trillion-element
> > maps!
>
> In the worst case (most unbalanced) 40 corresponds to a few million
> keys, not trillions. If I haven't made a mistake the minimum number of
> keys in a (OCaml) Map with height h is given by the recurrence:
> T(h)=1+T(h-1)+T(h-3).

I think you are right. Regardless, your computer is unlikely to struggle with 
80 recursive function calls.

> > > Yet another problem is the missing cardinal function.
> >
> > An arbitrary number of functions are "missing". You can't expect INRIA to
> > implement all infinity of them for you! :-)
>
> Considering the fact that the code in set.ml and map.ml looks like it
> is copy&pasted I guess asking for some consistency in the interface is
> not unreasonable. In any case copy&pasting the cardinal function
> wouldn't provide better performance (I think).

Yes, Set's cardinal looks O(n) to me. So you're asking for an equally 
inefficient cardinal function in Map? ;-)

Personally, I'd like to see a core balanced binary tree implementation with
various different set and map (and other) data structures built from it.

> > The STL will probably take <<1ms because the STL's size() is probably
> > O(1) whereas Map's fold is O(n). In contrast, forking lineages is
> > probably O(n) with the STL (requiring a copy) and O(1) with ocaml's Map.
>
> I haven't actually tested with STL but the implementation of size() is
> indeed a simple return. Do you have an example where forking lineages
> is useful?

I often exploit persistence when writing interpreters in a functional style,
to handle bindings in an inner scope. Note that more intelligent people use
an imperative style and the seemingly-quirky Hashtbl for this...

> As I said, I don't doubt there are situations where it is
> useful; just that right now I'm having trouble coming up with a
> realistic example.

I can't think of any other places I've used persistence...

> I am exploring possibilities right now. My only "complaint" is that
> switching from Hashtbl to Map for a dictionary of about 200000 keys
> decreases the performance a LOT! I would have expected a factor of
> 20-30 but it seems to be a lot more: probably due to memory management
> (as Gerd Stolpmann notices) due to the functional nature of Map.

Now that I've had a little more time to think about it, I can think of three
possible causes of this poor performance. The most likely is the O(ln n)
(unnecessary) string comparisons it will be doing (particularly if your
strings are similar "lexicographically"), compared to a single hash. Next is
the time spent in allocate and GC, as someone else said. Third is cache
incoherence, as I said before.

You could profile your code to work out which but you might as well skip this
and just use a more appropriate data structure. Or use Hashtbl and sort, as
Gerd says. If you really want to stick with Map then you could also store a
hash along with each string, so the comparison acts on the hash before it
acts on the string. This should greatly speed up the comparisons (oh, and
write your own comparison function, using String.compare and not
Pervasives.compare) and, I think, the whole program.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.