Browse thread
Estimating the size of the ocaml community
-
Yaron Minsky
-
Christopher A. Watford
-
Frédéric_Gava
-
skaller
-
Erik de Castro Lopo
- Olivier_Pérès
-
Thomas Fischbacher
-
Frédéric_Gava
-
Thomas Fischbacher
- Paul Snively
- josh
- Richard Jones
-
Jon Harrop
-
Michael Walter
-
Jon Harrop
- Damien Doligez
- Thomas Fischbacher
- Michael Walter
-
Radu Grigore
- Gerd Stolpmann
- Jon
-
Jon Harrop
- Thomas Fischbacher
- Richard Jones
-
Michael Walter
- Ville-Pertti Keinonen
- Oliver Bandel
- Basile STARYNKEVITCH
-
Thomas Fischbacher
- ronniec95@l...
- skaller
- chris.danx
-
Frédéric_Gava
-
Erik de Castro Lopo
- sejourne_kevin
- Stefano Zacchiroli
-
skaller
-
Frédéric_Gava
- Kenneth Knowles
- Michael Jeffrey Tucker
- Richard Jones
- Nicolas Cannasse
- Evan Martin
- Eric Stokes
- chris.danx
- Sylvain LE GALL
- sejourne_kevin
- Sven Luther
- Johann Spies
-
Christopher A. Watford
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Jon <jdh30@c...> |
| Subject: | Re: [Caml-list] The boon of static type checking |
On Feb 6 2005, Radu Grigore wrote: > On Fri, 4 Feb 2005 10:26:55 +0000, Jon Harrop <jon@jdh30.plus.com> wrote: > > The code to implement a simple tree is unwieldy in C++ (e.g. 1975 LOC > > in stl_tree.h and stl_map.h vs 198 LOC in map.ml). > > The implementation in stl_tree is not a "simple" tree. The difference > in size is also because the STL implementation is more efficient. This is a much smaller effect compared to the verbosity imposed by inherent inefficiencies of the C++ language though. > For one thing RB trees are better balanced than HB(2) trees. The implementation details should be considered secondarily to the functionality provided, IMHO. Thus, I would consider the ability of the (functional) ocaml Set and Map to fork lineages to be more important. There might be an imperative RB-tree ocaml implementation hanging around somewhere if you want to make a more accurate comparison... > Another observation is that fold on maps is not tail recursive. But the depth of recursion on the stack is O(ln_2 n), IIRC. > True, the > stack will probably never go above 40 calls but it is a problem if you > want to accumulate large structures. I would like to think that your stack will be big enough to fit 40 recursive calls if you expect your computer to handle trillion-element maps! > Yet another problem is the missing cardinal function. An arbitrary number of functions are "missing". You can't expect INRIA to implement all infinity of them for you! :-) As someone else has posted, you can easily supplement the current Map with code to keep track of the number of mappings with a good complexity and reasonable efficiency. A reason not to implement the cardinal function is the overhead of having counters in the nodes of the tree or the cost of testing for membership with every add/remove. > I have a StringMap with about 200000: it > takes about 3 minutes (!!!) to get its length using a fold. I guess this poor performance is due to cache incoherence but, of course, this is the wrong data structure for the job. > That would > take something like 1 millisecond in C++. The STL will probably take <<1ms because the STL's size() is probably O(1) whereas Map's fold is O(n). In contrast, forking lineages is probably O(n) with the STL (requiring a copy) and O(1) with ocaml's Map. > I'm a bit annoyed right now with OCaml's Map because I'm writting a > tool where I need to construct big associative tables from strings to > other data. An implementation with Hashtbl makes the program work in 1 > minute. The problem is that the next execution phase needs to fold > over the tables in increasing order of keys. Simple, no? Switch from > Hashtbl to Map. Right. Now the same program doesn't finish executing > in 3 hours. I would have expected a running time of about 20-30 > minutes. > Maybe a hand-written lexical-tree will be good enough. You seem to be complaining about poor performance when you know you are using the wrong data structure. Talk about unreasonable expectations. :-) IIRC, the exact data structure you want is already on the hump. HTH. Cheers, Jon.