Version française
Home     About     Download     Resources     Contact us    
Browse thread
Strings
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Goswin von Brederlow <goswin-v-b@w...>
Subject: Re: [Caml-list] Strings
Daniel Bünzli <daniel.buenzli@erratique.ch> writes:

> Le 3 avr. 09 à 18:52, Martin Jambon a écrit :
>
>> I love this recurrent discussion!
>
> I love your carefully argumented response !
>
>> - I see absolutely no practical advantage of having an immutable
>> "character
>> string" type.
>
> In fact I find the result of the following sequence of operations very
> disappointing for a functional programming language :
>
>         Objective Caml version 3.11.0
>
> # Sys.os_type;;
> - : string = "Unix"
> # let s = Sys.os_type;;
> val s : string = "Unix"
> # s.[0] <- 'a';;
> - : unit = ()
> # Sys.os_type;;
> - : string = "anix"
>
> I think it is a design error to conflate strings and byte arrays. You
> clearly want both, but each with its own type and strings as
> immutable. Individual character mutability is rarely needed in text
> processing and having immutable strings avoids the kind of quirks as
> seen above.

I think that is a design flaw in Sys. Strings are mutable. The os_type
is a constant. It should not hand out mutable access to a constant.

With the current string module a better way would be to return a copy
of os_type on each invocation. Drawback there is that then

Sys.os_type () != Sys.os_type ()

> You'll think that's a marginal example, but that actually happens in
> practice. For example in xmlm when I return a signal for a start tag I
> do not String.copy the tag name to avoid allocating too much. Thus in
> the documentation there's the following ugly advice :
>
> "The module assumes strings are immutable, thus strings the client
> gives or receives during the input and output process must not be
> modified."
>
> And if you don't follow the advice and mutate the tag's name before
> the end tag was parsed (or output) you'll get a tag mismatch error
> even though the document (or the output) is perfectly valid.
>
> Having immutable strings would not rely on the client for correctness
> of operation and that's always an advantage. Of course you'll tell me
> just use String.copy inside xmlm et voilà, but then you traded
> correctness for performance in a case where you could have both with
> immutable strings.

This is not just a problem for strings. Any data type can suffer the same.

>> - There is nothing to change in OCaml's string type because it is an
>> "array of
>> bytes", with type char representing single bytes.
>
>
> Oh no, there's nothing to change at all, that's a perfect
> implementation of byte arrays. You just want another type for
> immutable strings.
>
> Best,
>
> Daniel

It wouldn't be too hard to change the string module to allow for both
mutable and immutable strings:

module S :
sig
  type const
  type mutabl
  type 'a t
  val make : string -> mutabl t
  val set : mutabl t -> int -> char -> unit
  val get : 'a t -> int -> char
  val const : 'a t -> const t
  val print : 'a t -> unit
end = struct
  type const
  type mutabl
  type 'a t = string
  let make s = s
  let set = String.set
  let get = String.get
  let const s = s
  let print = print_string
end

let str = S.make "hallo" in
  S.set str 0 'H'; S.print str
let str = S.const (S.make "hallo") in
  S.set str 0 'H'; S.print str
        ^^^
Error: This expression has type S.const S.t but is here used with type
         S.mutabl S.t

By adding a phantom type the type system can keep track of where a
string is mutable and where not. The only restriction is that "const"
does not mean the string will not change. It only means that that
reference to the string can not change it:

# let str = S.make "hallo" in
  let str2 = S.const str in
    S.set str 0 'H'; S.print str2;;
Hallo- : unit = ()

If you let a mutable reference to the string escape and then assume it
remains const that is your problem. Easily avoidable in a library or
module.

MfG
        Goswin