Version française
Home     About     Download     Resources     Contact us    
Browse thread
Estimating the size of the ocaml community
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Gerd Stolpmann <gerd@g...>
Subject: Re: [Caml-list] The boon of static type checking
Am Sonntag, den 06.02.2005, 12:22 +0200 schrieb Radu Grigore:
> On Fri, 4 Feb 2005 10:26:55 +0000, Jon Harrop <jon@jdh30.plus.com> wrote:
> 
> > The code to implement a simple tree is unwieldy in C++ (e.g. 1975 LOC
> > in stl_tree.h and stl_map.h vs 198 LOC in map.ml).
> 
> The implementation in stl_tree is not a "simple" tree. The difference
> in size is also because the STL implementation is more efficient. For
> one thing RB trees are better balanced than HB(2) trees. Another
> observation is that fold on maps is not tail recursive. True, the
> stack will probably never go above 40 calls but it is a problem if you
> want to accumulate large structures. Yet another problem is the
> missing cardinal function. I have a StringMap with about 200000: it
> takes about 3 minutes (!!!) to get its length using a fold. That would
> take something like 1 millisecond in C++.
> 
> I'm a bit annoyed right now with OCaml's Map because I'm writting a
> tool where I need to construct big associative tables from strings to
> other data. An implementation with Hashtbl makes the program work in 1
> minute. The problem is that the next execution phase needs to fold
> over the tables in increasing order of keys. Simple, no? Switch from
> Hashtbl to Map. Right. Now the same program doesn't finish executing
> in 3 hours. I would have expected a running time of about 20-30
> minutes.
> 
> Maybe a hand-written lexical-tree will be good enough.

I think you are a bit unfair, and it seems you did not analyse your
problem well enough to get a good solution.

The philosophy of the ocaml standard library is to provide light-weight
data structures that can be easily extended to one's needs. If you come
to the conclusion that a Map with cardinality is your optimal structure,
simply construct it:

module MakeMapWithCard (Ord : Map.OrderedType) =
struct
  module M = Map.Make(Ord)

  type key = M.key
  type t = int * M.t   (* cardinality, map *)

  let empty = (0, M.empty)

  let add key x (card,m) =
    if M.mem x m then
      (card,m)
    else
      (card+1, M.add x m)

  ...
end

You need not to program another balanced tree, just wrap it around the
available implementation. (The best is that the resulting module is
still compatible with the Map module of the standard library.)

The other point is that the hash tables, balanced trees etc. in ocaml
are just implementations of the common algorithms. There is no optimal
container that works best for all needs. Not in ocaml, not in C++, not
in any other language.

For your problem, I would guess that a hash table (in phase 1) plus
sorting (in phase 2) is the relatively best solution (from your
description). That avoids a lot of overhead compared to Map, especially
memory management consumes much less time.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------