Browse thread
Array 4 MB size limit
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Martin Jambon <martin_jambon@e...> |
| Subject: | Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) |
On Thu, 25 May 2006, Aleksey Nogin wrote: > On 24.05.2006 22:56, Martin Jambon wrote: > >>> I think it's OK to have (mutable) byte arrays, but strings should simply >>> always be immutable. >> OCaml strings are compact byte arrays which serve their purpose well. > > Yes, however immutable strings are also very useful and that functionality is > simply missing in OCaml. The usage I am very interested in is essentially > using strings as "printable tokens". In other words, a data type that is easy > to compare and has an obvious I/O representation. > >> Having a whole different type for immutable strings is in my opinion a >> waste of energy. The problem is that freezing or unfreezing a string safely >> involves a copy of the whole string. And obviously it's not possible to >> handle only immutable strings since somehow you have to create them, and >> unlike record fields, they won't be set in one operation but in n >> operations, n being the length of the string. > > This is not true. All I want is having a purely functional interface with: > - Constants (a compiler flag for turning "..." constants into immutable > strings instead of mutable ones). > - Inputing from a channel > - Concatenation > - Things like string_of_int for immutable string. Isn't it a bit limited? What if I want other functions? But if it satisfies you, you can do the syntax part with an unsafe_freeze function and a bit of camlp4. The rest is just plain old OCaml. > Of course, it might be the case that the standard library might have to use > some sort of "unsafe" operations that would "inappropriately" mutate the > newly created immutable string buffer, but this is IMHO no different than how > the unsafe operations are already used in standard library for arrays and > strings. I disagree: has it ever happened to you to mutate a string by accident? I never met this situation and this is mostly why I don't see the point of your suggestions. This strongly constrasts with mistakes in array/string indices which happen all the time. >> So I'd really love to see actual examples where using immutable strings >> would be such an improvement over mutable strings. >> If the problem is just to ensure that string data won't be changed by the >> user of a library, then it is trivial using module signatures and >> String.copy for the conversions. > > Such a copy operation can be extremely prohibitive in a setting that assumes > that a data structure is immutable and tries really hard to preserve sharing > (including using functions like a sharing-preserving version of map (*), > etc). In such a setting, these extra copies can potentially have a > devastating effect on memory usage, cache performance, etc. And this > situation is exactly what we have in our MetaPRL project - there we have > resorted to simply using strings and pretending they are immutable, but this > is clearly suboptimal. Yes, so how do you avoid copies without using the "unsafe" conversions all over the place? > ---- > (*) > let rec smap f = function > [] -> [] > | (hd :: tl) as l -> > let hd' = f hd in > let tl' = smap f tl in > if hd == hd' && tl == tl' then l else hd' :: tl' In order to maximize sharing, I'd rather use a global weak hash table. In your context, it seems that you could afford String.copy, as long as it doesn't break sharing: let freeze s = let s' = make_constant s (* using a copy! *) in if s' is in the table then return the element from the table else add s' and return s' Martin -- Martin Jambon, PhD http://martin.jambon.free.fr Edit http://wikiomics.org, bioinformatics wiki