Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] looping recursion
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Alex Baretta <alex@b...>
Subject: Re: [Caml-list] looping recursion
brogoff wrote:
> On Wed, 28 Jul 2004, Alex Baretta wrote:
>>brogoff wrote:
>>>Sometimes lists are best. If lists are OK for 100, 1_000, or 10_000 items,
>>>why not 100_000 or 1_000_000? Map and friends shouldn't blow stack.
>>>-- Brian
>>Actually, it's not that bad.
> Well, I'm still on the caml-list, so of course it isn't *that* bad. Also, as
> I said, if you need a tail recursive map over built in lists, you have at
> least two options. Unfortunately, my preference is to use Obj, which IMO
> points to a deficiency in the core language. Most, maybe all, of my issues
> with OCaml, amount to petty complaints, no showstoppers or dealbreakers.
> Hey, at least I'm not whining about the license!

Yes, I guess we simply can live with it. I can't wait to have STL 
iterators in Ocaml, but meanwhile I can live with this functional 
nonsense: lists, maps, sets... ;)

> There are quite a few cases where singly linked lists are the most efficient
> data structure, and they're just right for the problem. Streams and laziness
> are  not at all efficient in comparison. Stack overflow commonly happens whenm
> one of my users (yup, I have users of my code who aren't me ;-) decides to
> run on data much larger than I'd anticipated.

Huge lists are usually not an efficient data structure. Lists are local 
structures: at any point you can only observe a fixed number of head 
elements (usually one at a time) and an unspecified tail. Such locality 
generally makes caching the entire data structure useless. Long lists 
(lists of a priori unbounded length) arise from user input: typically 
the are obtained by reading in a file of a priori unbounded length. In 
such cases lists are very much inefficient from the point of view of 
memory complexity: lists cost O(n) where n is the size of the input; 
streams cost O(1). Both cost O(n) in terms of time complexity. Usually 
the potential speed benefit of lists is amply counterbalanced by 
reduction in memory footprint, which usually means no memory swapping in 
the kernel.

Here's an obvious example (not too obvious: I fell for it at first). I 
have a XML syntax specifying relational databases. Essentially an entire 
relational database can be represented with it. This XML lexicon is 
meant for now only for the database initial schema definition and for 
human-readable backups. I initially wrote the backup algorithm in terms 
of transformations on a list of PXP trees which were finally serialized. 
Algebraically, this is perfect. The only problem is that this algorithm 
stores the entire contents of a database in memory before serializing 
everything. Ooops, here's a SIGKILL landing in! Now I only use PXP trees 
to represent the database schema, which I immediately begin to 
serialize. All the while I pass around continuations which are invoked 
iteratively on subnodes. If a subnode is an table-data node I declare an 
SQL cursor in the database and begin walking through the table. Result, 
where my initial memory footprint bought me a SIGKILL on a 2GB server 
now I have bounded memory usage (a few megabytes). I don't swap, so the 
the backup process actually runs an order of magnitude faster. And I 
actually get my output at the end ;)

> Also, lists are builtin, and have syntactic support in OCaml, by which I mean
> nice, easy to read syntax. So the language encourages you to use them.

They are sooo cool! I actually ask all trainees to reimplement the List 
module. But then the industrial practice is to use lists only where 
there is no input or as the result of some kind of slicing or research 
applied to bigger and more efficient data structures. I actually use a 
list iteration to schedule tasks in my real-time control kernel, but of 
course, I don't expect my collegues to write more than a few tasks at a 
time for my kernel to schedule: safety, liveness and UI.

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: