Version française
Home     About     Download     Resources     Contact us    
Browse thread
Array 4 MB size limit
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Aleksey Nogin <nogin@c...>
Subject: Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
On 24.05.2006 22:56, Martin Jambon wrote:

>> I think it's OK to have (mutable) byte arrays, but strings should simply
>> always be immutable.
>  
> OCaml strings are compact byte arrays which serve their purpose well.

Yes, however immutable strings are also very useful and that 
functionality is simply missing in OCaml. The usage I am very interested 
in is essentially using strings as "printable tokens". In other words, a 
data type that is easy to compare and has an obvious I/O representation.

> Having a whole different type for immutable strings is in my opinion a 
> waste of energy. The problem is that freezing or unfreezing a string 
> safely involves a copy of the whole string. And obviously it's not 
> possible to handle only immutable strings since somehow you have to 
> create them, and unlike record fields, they won't be set in one 
> operation but in n operations, n being the length of the string.

This is not true. All I want is having a purely functional interface with:
- Constants (a compiler flag for turning "..." constants into immutable 
strings instead of mutable ones).
- Inputing from a channel
- Concatenation
- Things like string_of_int for immutable string.

Of course, it might be the case that the standard library might have to 
use some sort of "unsafe" operations that would "inappropriately" mutate 
the newly created immutable string buffer, but this is IMHO no different 
than how the unsafe operations are already used in standard library for 
arrays and strings.

> So I'd really love to see actual examples where using immutable strings 
> would be such an improvement over mutable strings.
> If the problem is just to ensure that string data won't be changed by 
> the user of a library, then it is trivial using module signatures and 
> String.copy for the conversions.

Such a copy operation can be extremely prohibitive in a setting that 
assumes that a data structure is immutable and tries really hard to 
preserve sharing (including using functions like a sharing-preserving 
version of map (*), etc). In such a setting, these extra copies can 
potentially have a devastating effect on memory usage, cache 
performance, etc. And this situation is exactly what we have in our 
MetaPRL project - there we have resorted to simply using strings and 
pretending they are immutable, but this is clearly suboptimal.

----
(*)
let rec smap f = function
    [] -> []
  | (hd :: tl) as l ->
       let hd' = f hd in
       let tl' = smap f tl in
          if hd == hd' && tl == tl' then l else hd' :: tl'

-- 
Aleksey Nogin

Home Page: http://nogin.org/
E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal)
Office: Moore 04, tel: (626) 395-2200