> The code is below. The code works for values <2^30,
> but fails when and int goes negative.
It is easy to fix this. The "parse_utf8" function needs not be
modified. For "utf8_of_int", just replace all tests i < CST by
i >= 0 && i < CST, e.g.
> let utf8_of_int i =
> let chr x = String.make 1 (Char.chr x) in
> if i >= 0 && i < 0x80 then
> chr(i)
> else if i >= 0 && i < 0x800 then
> chr(0xC0 lor ((i lsr 6) land 0x1F)) ^
> chr(0x80 lor (i land 0x3F))
> else if i >= 0 && i < 0x10000 then
> chr(0xE0 lor ((i lsr 12) land 0xF)) ^
> chr(0x80 lor ((i lsr 6) land 0x3F)) ^
> chr(0x80 lor (i land 0x3F))
> else if i >= 0 && i < 0x200000 then
> chr(0xF0 lor ((i lsr 18) land 0x7)) ^
> chr(0x80 lor ((i lsr 12) land 0x3F)) ^
> chr(0x80 lor ((i lsr 6) land 0x3F)) ^
> chr(0x80 lor (i land 0x3F))
> else if i >= 0 && i < 0x4000000 then
> chr(0xF8 lor ((i lsr 24) land 0x3)) ^
> chr(0x80 lor ((i lsr 18) land 0x3F)) ^
> chr(0x80 lor ((i lsr 12) land 0x3F)) ^
> chr(0x80 lor ((i lsr 6) land 0x3F)) ^
> chr(0x80 lor (i land 0x3F))
> else chr(0xFC lor ((i lsr 30) land 0x1)) ^
> chr(0x80 lor ((i lsr 24) land 0x3F)) ^
> chr(0x80 lor ((i lsr 18) land 0x3F)) ^
> chr(0x80 lor ((i lsr 12) land 0x3F)) ^
> chr(0x80 lor ((i lsr 6) land 0x3F)) ^
> chr(0x80 lor (i land 0x3F))
or special-case i < 0 immediately and treat it as in the last "else"
clause.
> Note these operations MUST be extremely fast,
> and in particular, compact storage of ISO-10646
> code points in arrays of integers is OK,
> while arrays of boxed values is out of the question.
> (So I can't use int32).
If they MUST be extremely fast, you'd rather avoid the repeated "^"
operations and allocate and fill the resulting string directly, e.g.
> else if i >= 0 && i < 0x800 then begin
let res = String.create 2 in
res.[0] <- chr(0xC0 lor ((i lsr 6) land 0x1F));
res.[1] <- chr(0x80 lor (i land 0x3F));
res
end else ...
Hope this helps,
- Xavier Leroy
This archive was generated by hypermail 2b29 : Fri Mar 24 2000 - 17:08:19 MET