Re: Unsigned integers?

From: Xavier Leroy (Xavier.Leroy@inria.fr)
Date: Fri Mar 24 2000 - 15:50:12 MET

  • Next message: Xavier Leroy: "Re: Unsigned integers?"

    > The code is below. The code works for values <2^30,
    > but fails when and int goes negative.

    It is easy to fix this. The "parse_utf8" function needs not be
    modified. For "utf8_of_int", just replace all tests i < CST by
    i >= 0 && i < CST, e.g.

    > let utf8_of_int i =
    > let chr x = String.make 1 (Char.chr x) in
    > if i >= 0 && i < 0x80 then
    > chr(i)
    > else if i >= 0 && i < 0x800 then
    > chr(0xC0 lor ((i lsr 6) land 0x1F)) ^
    > chr(0x80 lor (i land 0x3F))
    > else if i >= 0 && i < 0x10000 then
    > chr(0xE0 lor ((i lsr 12) land 0xF)) ^
    > chr(0x80 lor ((i lsr 6) land 0x3F)) ^
    > chr(0x80 lor (i land 0x3F))
    > else if i >= 0 && i < 0x200000 then
    > chr(0xF0 lor ((i lsr 18) land 0x7)) ^
    > chr(0x80 lor ((i lsr 12) land 0x3F)) ^
    > chr(0x80 lor ((i lsr 6) land 0x3F)) ^
    > chr(0x80 lor (i land 0x3F))
    > else if i >= 0 && i < 0x4000000 then
    > chr(0xF8 lor ((i lsr 24) land 0x3)) ^
    > chr(0x80 lor ((i lsr 18) land 0x3F)) ^
    > chr(0x80 lor ((i lsr 12) land 0x3F)) ^
    > chr(0x80 lor ((i lsr 6) land 0x3F)) ^
    > chr(0x80 lor (i land 0x3F))
    > else chr(0xFC lor ((i lsr 30) land 0x1)) ^
    > chr(0x80 lor ((i lsr 24) land 0x3F)) ^
    > chr(0x80 lor ((i lsr 18) land 0x3F)) ^
    > chr(0x80 lor ((i lsr 12) land 0x3F)) ^
    > chr(0x80 lor ((i lsr 6) land 0x3F)) ^
    > chr(0x80 lor (i land 0x3F))

    or special-case i < 0 immediately and treat it as in the last "else"
    clause.

    > Note these operations MUST be extremely fast,
    > and in particular, compact storage of ISO-10646
    > code points in arrays of integers is OK,
    > while arrays of boxed values is out of the question.
    > (So I can't use int32).

    If they MUST be extremely fast, you'd rather avoid the repeated "^"
    operations and allocate and fill the resulting string directly, e.g.

    > else if i >= 0 && i < 0x800 then begin
            let res = String.create 2 in
            res.[0] <- chr(0xC0 lor ((i lsr 6) land 0x1F));
            res.[1] <- chr(0x80 lor (i land 0x3F));
            res
        end else ...

    Hope this helps,

    - Xavier Leroy



    This archive was generated by hypermail 2b29 : Fri Mar 24 2000 - 17:08:19 MET