Version franēaise
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] implementing bit vectors in OCaml
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <xavier.leroy@i...>
Subject: Re: [Caml-list] implementing bit vectors in OCaml
> We have a program that is spending a lot of time in set operations,
> and we're thinking of trying an imperative implementation based on
> bit vectors.
> I would hope that the basis of such an implementation would be an array
> of native integers, but on scrutinizing the manual (espeically the chapter
> on interfacing to C), I have concluded that such a thing is not possible.
> Our choices appear to be
> 
>   * An array of native integers, which will be implemented as an array
>     of pointers to native integers, because integers are boxed.
> 
>   * An array of tagged integers, which will be less efficient as
>     dividing by 31 is more expensive than shifting.
> 
> How would the gurus recommend that we proceed?  Is there a better,
> still efficient data structure for a set of small integers?

As others mentioned, you can use strings, which are really arrays of
bytes.  For instance:

type t = string

let create nbits = String.make ((nbits + 7) / 8) '\000'

let is_set s n =
  (Char.code s.[n lsr 3]) land (1 lsl (n land 7)) <> 0

let set s n =
  let i = n lsr 3 in
  s.[i] <- Char.unsafe_chr (Char.code s.[i] lor (1 lsl (n land 7)))

let clear s n =
  let i = n lsr 3 in
  s.[i] <- Char.unsafe_chr (Char.code s.[i] land
                                         (0xFF lxor (1 lsl (n land 7))))

Operations on whole sets such as union and intersection will not be as
fast as they could be with an array of 32- or 64-bit integers
(you're processing the set 8 bits at a time instead of 32 or 64 bits
at a time), though.

Another option is to use 1-dimensional bigarrays of kind int32 or
nativeint.  In bytecode, bigarray accesses are much slower than string
accesses.  However, with sufficient type constraints, ocamlopt can
generate good inline code for bigarray accesses.

For very sparse sets, binary trees (as in the Set module) or Patricia
trees (as in J.-C. Filliātre's library) can be more efficient, though.

> Could the compiler gods be persuaded to provide unboxed
> representations for arrays of untagged integers, as is already done
> for `float array'?

The special case for float arrays is already a bit of a hack and it
isn't clear this was a really good idea -- although it sure helps with
benchmarks :-)  

The problem with this special case is that every polymorphic array
operation (when the array type is not known statically) is turned into
a run-time test

        if this_is_a_float_array
        then box_float(load_float_from_array)
        else load_word_from_array

If additional special cases were added, these generic array operations
would become even more costly and generate even bigger code...

Hope this helps,

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners