Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] internal representation of string
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <xavier.leroy@i...>
Subject: Re: [Caml-list] internal representation of string
> What is the internal representation of string? Is it basically a C-string 
> [with or without terminating '\0'] plus integer storing its size? Or is it 
> something more sophisticated?

Like all heap blocks, strings contain a header defining the size of
the string in machine words.  The actual block contents are:
- the characters of the string
- padding bytes to align the block on a word boundary.  
  The padding is one of
    00
    00 01
    00 00 02
    00 00 00 03
  on a 32-bit machine, and up to 00 00 .... 07 on a 64-bit machine.

Thus, the string is always zero-terminated, and its length can be
computed as follows:

    number_of_words_in_block * sizeof(word) + last_byte_of_block - 1

The null-termination comes handy when passing a string to C, but is
not relied upon to compute the length (in Caml), allowing the string
to contain nulls.

> Also, do functions like String.sub implement 
> copy-on-write mechanism or do they copy when they are called?

They copy when they are called.  Caml strings really behave like
compactly-represented character arrays.

Hope this answers your question,

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners