English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
[Caml-list] Strings as arrays or lists...
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2003-03-03 (21:43)
From: Alain.Frisch@e...
Subject: Re: [Caml-list] Strings as arrays or lists...
On Sun, 2 Mar 2003, Xavier Leroy wrote:

> > > in Haskell, strings are lists of chars.
> >
> > And what a horrible design decision that is!
> Agreed.  Well, it's a great way to multiply the memory requirements
> for your strings by a factor of 12 (on 32-bit platforms) or 24 (on
> 64-bit platforms), while at the same time losing constant-time
> indexing :-)

Not necessarily. Considering strings as lists of chars is a design
decision about the semantics of the language, not its implementation. For
instance in CDuce, we choosed to define the type String as [Char*], that
is, homogeneous sequences of (Unicode) characters. The idea is that we
want to have characters and elements at the same level inside XML elements
(the other solution is to have elements and strings; but when you
concatenate two sequences of elements-or-strings, you want to perform
implicit concatenation of strings at the middle to avoid two consecutive
strings). For the implementation, we introduce a compact representation of
strings at runtime, namely a pointer to a segment of a buffer (OCaml
string) + a "continuation" (which can be a representation of the same
kind, or a real list cell). When a pattern tries to deconstruct such a
representation to see it as list cell (head,tail), the runtime system
extracts the first character from the buffer and compute a pointer to the
next character (depending on the internal Unicode encoding of the buffer).
When a pattern extracts a consecutive subsequence (substrings)  - patterns
in CDuce are roughly regular expressions - it actually returns a pointer
to a subsegment of the buffer. Concatenation of sequences is computed
lazily, to avoid quadractic behavior when appending elements one by one at
the end. Basically, in many situations, we avoid space overhead and keep a
reasonable time overhead.

I'm not advocating this kind of techniques for a fully general
language such as OCaml; I just wanted to defend the "horrible" design for
specific applications...

-- Alain

To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners