Version française
Home     About     Download     Resources     Contact us    
Browse thread
Array 4 MB size limit
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Martin Jambon <martin_jambon@e...>
Subject: Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
On Thu, 25 May 2006, Aleksey Nogin wrote:

> On 24.05.2006 22:56, Martin Jambon wrote:
>
>>> I think it's OK to have (mutable) byte arrays, but strings should simply
>>> always be immutable.
>>  OCaml strings are compact byte arrays which serve their purpose well.
>
> Yes, however immutable strings are also very useful and that functionality is 
> simply missing in OCaml. The usage I am very interested in is essentially 
> using strings as "printable tokens". In other words, a data type that is easy 
> to compare and has an obvious I/O representation.
>
>> Having a whole different type for immutable strings is in my opinion a 
>> waste of energy. The problem is that freezing or unfreezing a string safely 
>> involves a copy of the whole string. And obviously it's not possible to 
>> handle only immutable strings since somehow you have to create them, and 
>> unlike record fields, they won't be set in one operation but in n 
>> operations, n being the length of the string.
>
> This is not true. All I want is having a purely functional interface with:
> - Constants (a compiler flag for turning "..." constants into immutable 
> strings instead of mutable ones).
> - Inputing from a channel
> - Concatenation
> - Things like string_of_int for immutable string.

Isn't it a bit limited? What if I want other functions?

But if it satisfies you, you can do the syntax part with an unsafe_freeze 
function and a bit of camlp4. The rest is just plain old OCaml.

> Of course, it might be the case that the standard library might have to use 
> some sort of "unsafe" operations that would "inappropriately" mutate the 
> newly created immutable string buffer, but this is IMHO no different than how 
> the unsafe operations are already used in standard library for arrays and 
> strings.

I disagree: has it ever happened to you to mutate a string by accident?
I never met this situation and this is mostly why I don't see the point of 
your suggestions. This strongly constrasts with mistakes in array/string 
indices which happen all the time.


>> So I'd really love to see actual examples where using immutable strings 
>> would be such an improvement over mutable strings.
>> If the problem is just to ensure that string data won't be changed by the 
>> user of a library, then it is trivial using module signatures and 
>> String.copy for the conversions.
>
> Such a copy operation can be extremely prohibitive in a setting that assumes 
> that a data structure is immutable and tries really hard to preserve sharing 
> (including using functions like a sharing-preserving version of map (*), 
> etc). In such a setting, these extra copies can potentially have a 
> devastating effect on memory usage, cache performance, etc. And this 
> situation is exactly what we have in our MetaPRL project - there we have 
> resorted to simply using strings and pretending they are immutable, but this 
> is clearly suboptimal.

Yes, so how do you avoid copies without using the "unsafe" conversions all 
over the place?


> ----
> (*)
> let rec smap f = function
>   [] -> []
> | (hd :: tl) as l ->
>      let hd' = f hd in
>      let tl' = smap f tl in
>         if hd == hd' && tl == tl' then l else hd' :: tl'

In order to maximize sharing, I'd rather use a global weak hash table.
In your context, it seems that you could afford String.copy, as long as it 
doesn't break sharing:

let freeze s =
   let s' = make_constant s (* using a copy! *) in
   if s' is in the table then return the element from the table
   else add s' and return s'



Martin

--
Martin Jambon, PhD
http://martin.jambon.free.fr

Edit http://wikiomics.org, bioinformatics wiki