Version française
Home     About     Download     Resources     Contact us    
Browse thread
Estimating the size of the ocaml community
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jon <jdh30@c...>
Subject: Re: [Caml-list] The boon of static type checking
On Feb 6 2005, Radu Grigore wrote:
> On Fri, 4 Feb 2005 10:26:55 +0000, Jon Harrop <jon@jdh30.plus.com> wrote:
> > The code to implement a simple tree is unwieldy in C++ (e.g. 1975 LOC
> > in stl_tree.h and stl_map.h vs 198 LOC in map.ml).
> 
> The implementation in stl_tree is not a "simple" tree. The difference
> in size is also because the STL implementation is more efficient.

This is a much smaller effect compared to the verbosity imposed by inherent 
inefficiencies of the C++ language though.

> For one thing RB trees are better balanced than HB(2) trees.

The implementation details should be considered secondarily to the 
functionality provided, IMHO. Thus, I would consider the ability of the 
(functional) ocaml Set and Map to fork lineages to be more important. There 
might be an imperative RB-tree ocaml implementation hanging around 
somewhere if you want to make a more accurate comparison...

> Another observation is that fold on maps is not tail recursive.

But the depth of recursion on the stack is O(ln_2 n), IIRC.

> True, the
> stack will probably never go above 40 calls but it is a problem if you
> want to accumulate large structures.

I would like to think that your stack will be big enough to fit 40 
recursive calls if you expect your computer to handle trillion-element 
maps!

> Yet another problem is the missing cardinal function.

An arbitrary number of functions are "missing". You can't expect INRIA to 
implement all infinity of them for you! :-)

As someone else has posted, you can easily supplement the current Map with 
code to keep track of the number of mappings with a good complexity and 
reasonable efficiency.

A reason not to implement the cardinal function is the overhead of having 
counters in the nodes of the tree or the cost of testing for membership 
with every add/remove.

> I have a StringMap with about 200000: it
> takes about 3 minutes (!!!) to get its length using a fold.

I guess this poor performance is due to cache incoherence but, of course, 
this is the wrong data structure for the job.

> That would
> take something like 1 millisecond in C++.

The STL will probably take <<1ms because the STL's size() is probably O(1) 
whereas Map's fold is O(n). In contrast, forking lineages is probably O(n) 
with the STL (requiring a copy) and O(1) with ocaml's Map.

> I'm a bit annoyed right now with OCaml's Map because I'm writting a
> tool where I need to construct big associative tables from strings to
> other data. An implementation with Hashtbl makes the program work in 1
> minute. The problem is that the next execution phase needs to fold
> over the tables in increasing order of keys. Simple, no? Switch from
> Hashtbl to Map. Right. Now the same program doesn't finish executing
> in 3 hours. I would have expected a running time of about 20-30
> minutes.
> Maybe a hand-written lexical-tree will be good enough.

You seem to be complaining about poor performance when you know you are 
using the wrong data structure. Talk about unreasonable expectations. :-)

IIRC, the exact data structure you want is already on the hump. HTH.

Cheers,
Jon.