Browse thread
[Caml-list] Strings as arrays or lists...
-
oliver@f...
-
brogoff@s...
-
Xavier Leroy
- Alain.Frisch@e...
- Eric C. Cooper
-
Xavier Leroy
-
brogoff@s...
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Alain.Frisch@e... |
| Subject: | Re: [Caml-list] Strings as arrays or lists... |
On Sun, 2 Mar 2003, Xavier Leroy wrote: > > > in Haskell, strings are lists of chars. > > > > And what a horrible design decision that is! > > Agreed. Well, it's a great way to multiply the memory requirements > for your strings by a factor of 12 (on 32-bit platforms) or 24 (on > 64-bit platforms), while at the same time losing constant-time > indexing :-) Not necessarily. Considering strings as lists of chars is a design decision about the semantics of the language, not its implementation. For instance in CDuce, we choosed to define the type String as [Char*], that is, homogeneous sequences of (Unicode) characters. The idea is that we want to have characters and elements at the same level inside XML elements (the other solution is to have elements and strings; but when you concatenate two sequences of elements-or-strings, you want to perform implicit concatenation of strings at the middle to avoid two consecutive strings). For the implementation, we introduce a compact representation of strings at runtime, namely a pointer to a segment of a buffer (OCaml string) + a "continuation" (which can be a representation of the same kind, or a real list cell). When a pattern tries to deconstruct such a representation to see it as list cell (head,tail), the runtime system extracts the first character from the buffer and compute a pointer to the next character (depending on the internal Unicode encoding of the buffer). When a pattern extracts a consecutive subsequence (substrings) - patterns in CDuce are roughly regular expressions - it actually returns a pointer to a subsegment of the buffer. Concatenation of sequences is computed lazily, to avoid quadractic behavior when appending elements one by one at the end. Basically, in many situations, we avoid space overhead and keep a reasonable time overhead. I'm not advocating this kind of techniques for a fully general language such as OCaml; I just wanted to defend the "horrible" design for specific applications... -- Alain ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners