English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
[Caml-list] Wanted - General Purpose "Glue Logic" Data-Structures
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2003-03-20 (14:45)
From: John Gerard Malecki <johnm@a...>
Subject: [Caml-list] Wanted - General Purpose "Glue Logic" Data-Structures
I want to broaden my arsenal of data-structures that interface data
between the specific structures that some algorithms require.  Here is
an example of a real problem that I solved too specifically:

  I had an existing reader that produced a list of objects, 'a list.

  I wrote a filter which fractured those items producing 'a list list.

  I then wanted to feed that data to an existing program which builds
  a balanced tree top-down.  It expects 'a list which it then passes
  to a median finder which prefers its input list to be in random

In the great ocaml tradition this worked just fine and the first time.
(What a great language.)  To really test things I then ran it on
MOADB, the mother of all data bases, a very large but real-world

The program broke down in 2 places.  First, List.concat was used to
convert the output of the fracturer from 'a list list to 'a list.  Not
only is List.concat not tail-recursive but @ (Pervasives.append) is
also not tail-recursive.  I modified the fracturer to directly produce
'a list.

Second, the median program has some limitations.  It not only prefers
the input to be randomized but while it runs it too constructs some 'a
list list.  (Why?  This median program returns not just the median but
the set of items lt and gt the median.)  I re-wrote the median program.

I would prefer not to keep re-writing programs and instead have a
better collection intermediate data-structures that I can use to glue
my programs together efficiently and safely.  One can argue that the
median program was already flawed and it was best to re-write it.
Still, it would be nice to have a data-structure which the fracturer
could produce that not only can deal with large data sets but also has
the randomizing property.  Of course I want all this at very little
expense as none of these manipulations are germane to the real problem
at hand.

I considered having the fracturer build a priority queue over random
numbers but all that balancing book-keeping seems expensive.  I guess
what I'm looking for are inexpensive data-structures that have
properties like

   - fast computation of cardinality
   - fast addition of single elements
   - fast addition of lists of elements
   - fast removal of single elements in random order
   - not choking on a large data size

Any suggestions?  Does anyone have other useful glue-logic
data-structures which they use?

To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners