Version française
Home     About     Download     Resources     Contact us    
Browse thread
Re: Syntax for label
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <Xavier.Leroy@i...>
Subject: Re: Unsigned integers?
> The code is below. The code works for values <2^30,
> but fails when and int goes negative.

It is easy to fix this.  The "parse_utf8" function needs not be
modified.  For "utf8_of_int", just replace all tests i < CST by
i >= 0 && i < CST, e.g.

> let utf8_of_int i =
>   let chr x = String.make 1 (Char.chr x) in
>   if i >= 0 && i < 0x80 then 
>      chr(i)
>   else if i >= 0 && i < 0x800 then 
>      chr(0xC0 lor ((i lsr 6) land 0x1F))  ^
>       chr(0x80 lor (i land 0x3F))
>   else if i >= 0 && i < 0x10000 then 
>      chr(0xE0 lor ((i lsr 12) land 0xF)) ^
>       chr(0x80 lor ((i lsr 6) land 0x3F)) ^
>       chr(0x80 lor (i land 0x3F))
>   else if i >= 0 && i < 0x200000 then 
>      chr(0xF0 lor ((i lsr 18) land 0x7)) ^
>       chr(0x80 lor ((i lsr 12) land 0x3F)) ^
>       chr(0x80 lor ((i lsr 6) land 0x3F)) ^
>       chr(0x80 lor (i land 0x3F))
>   else if i >= 0 && i < 0x4000000 then 
>      chr(0xF8 lor ((i lsr 24) land 0x3)) ^
>       chr(0x80 lor ((i lsr 18) land 0x3F)) ^
>       chr(0x80 lor ((i lsr 12) land 0x3F)) ^
>       chr(0x80 lor ((i lsr 6) land 0x3F)) ^
>       chr(0x80 lor (i land 0x3F))
>   else chr(0xFC lor ((i lsr 30) land 0x1)) ^
>     chr(0x80 lor ((i lsr 24) land 0x3F)) ^
>     chr(0x80 lor ((i lsr 18) land 0x3F)) ^
>     chr(0x80 lor ((i lsr 12) land 0x3F)) ^
>     chr(0x80 lor ((i lsr 6) land 0x3F)) ^
>     chr(0x80 lor (i land 0x3F))

or special-case i < 0 immediately and treat it as in the last "else"
clause.

> Note these operations MUST be extremely fast,
> and in particular, compact storage of ISO-10646
> code points in arrays of integers is OK,
> while arrays of boxed values is out of the question.
> (So I can't use int32).

If they MUST be extremely fast, you'd rather avoid the repeated "^"
operations and allocate and fill the resulting string directly, e.g.

>   else if i >= 0 && i < 0x800 then begin
        let res = String.create 2 in
        res.[0] <- chr(0xC0 lor ((i lsr 6) land 0x1F));
        res.[1] <- chr(0x80 lor (i land 0x3F));
        res
    end else ...

Hope this helps,

- Xavier Leroy