Version française
Home     About     Download     Resources     Contact us    
Browse thread
Efficency of varient types
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Michael D. Adams <mdmkolbe@g...>
Subject: Re: [Caml-list] Efficency of varient types
On 11/27/05, Brian Hurt <bhurt@spnz.org> wrote:
> On Sun, 27 Nov 2005, Michael D. Adams wrote:
> > I agree, which is why it was my hope that OCaml might do some of that
> > for me.  Consider a home brew bool type, "type mybool = Mytrue |
> > Myfalse".  If the compiler were smart enough, it could represent that
> > as an unboxed type.
>
> The compiler is smart enough to do that already- variants without
> associated data are represented as ints.

I am glad to hear that.

> > From there it might be a small step to
> > semi-unboxed types such as the one I started this discussion with,
> > "type value = Int of int | Bool of bool | String of string".
>
> The problem here is that the variants take two words- one word which says
> which variant it is, and the second word which is the unboxed int, unboxed
> bool, or pointer to the boxed string.

I should clarify what I am proposing because it also only requires one
word.  The bottom one or two bits of that word would say which variant
it is.  The remaining bits representing any data with that variant. 
(Obviously this would only work if the associated data is small
enough, but fortunately most of the primitives are small enough (e.g.
int (31 bits), bool (1 bit), char (8 bits).))

(Please forgive me if the following is verbose, I want to make sure my
idea is clear.  In the scheme world I think it's called "pointer
tagging" or "tagged pointers".)

Consider "type value = Int of int | Bool of bool | None | String of
string".  For the moment I will assume that the GC only assumes 4 byte
aligned values are pointers.

Since int is 31 bits that leaves one bit to say what type it is.  We
don't want the GC grabbing it, so we will represent it with a single
'1' bit appended.  A 6 would be represented by 1101 (110 for the 6 and
1 for the "tag").  This means that "Int 6" only takes one word.

Let's look at "Bool of bool".  We only need one bit of representation
for that, leaving 31 bits free for tagging.  We can represent it by
appending a three bit tag of '010' to the regular bool value.  So true
is '1010' and false is '0010'.  (NB: this depends on the GC ignoring
non-4-byte-aligned values.) Again "Bool true" and "Bool false" would
only be one word.

The "None" case can use any free bit pattern.  We can give it the
three bit tag '110'.  Since there is no data, "None" would be '0110',
only one word.

A string can not fit in one word, so it will have to be boxed. We
would store a pointer to the block structure storing the data that
Nicolas mentioned (tag, GC bits, size).  Since all pointers allocated
by OCaml are four byte aligned, it would be something like '10001100'.
 Because the last two bits are '00' that forms as a sort of 'tag' that
signals "Hey, this thing is a pointer to find out what variant it is
you will have to actually dereference it and find the tag in the block
structure".  Again this pointer would be one word long.

As you mentioned functions like List.rev don't care what their
arguments are so long as they are a single word.  The above
representation satisfies that.

The only other interesting part is pattern matching.  Matching would
require examination of the final bits of a value.  The final bits
would determine the constructor, unless they were '00' (i.e. word
aligned) in which case the value is a pointer to a block that contains
the tag that determines the constructor.  In portable assembly (a.k.a.
C) a match operation would look something like the following.

if (x & 0x03) {
  /* It's an unboxed constructor */
  if (x & 0x01) { /* It's an Int */ }
  else if (x & 0x07 == 0x02) { /* It's a Bool */ }
  else if (x & 0x07 == 0x06) { /* It's a None */ }
  else { /* Malformed value */ }
} else {
  /* It's word aligned so it's a boxed constructor */
  switch (Tag_val(x)) {
    case STRING_TAG: /* It's a String */ break;
    default: /* Malformed value */
  }
}


Michael D. Adams
mdmkolbe@gmail.com