Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Primitive sizes
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Brian Hurt <bhurt@s...>
Subject: Re: [Caml-list] Primitive sizes
On Wed, 29 Sep 2004, Jonathan Bryant wrote:

> I would like to know the sizes of the "primitive" types in OCaml (I
> assume that they vary per platform, but one can hope that they are
> standard...)  

ints are either 31 or 63 bits, depending upon if you're on 32 or 64 bit 
machines (one bit is stolen for the tag bit).  Int32 and Int64 have the 
obvious bit size, but they are boxed integers (as opposed to ints, which 
are unboxed).  Chars are 8 bits and unboxed- but can't be used as short 
integers directly.

This should be a FAQ, if it isn't already.  We just recently had a 
discussion on this very mailing list on why ints are one bit short- I'd 
search the archives and read the discussion before bringing that 
discussion up again.

> If they do vary , is there any way to define new types
> (similar to C typedef macro)?  I would like to create 8-, 16-, 32-, and
> 64-bit integers, 32- and 64-bit floats, and 16-bit characters.  I know i
> could just create Int32s and Int64s and manipulate the bits ignoring the
> ones I don't need, but is there a way to allocate just the necessary
> memory without interfacing to C?  If not, can anyone point me in a good
> direction to learn how to interface with C (by "good" I mean that a
> tutorial is better/more preferable than a language specification...)?

The Ocaml manual has a good section on interfacing to C.  But I have to 
ask the question: why bother?  Especially with the integers?

First off, Ocaml holds all variables in single words- which are defined as 
the size of a pointer on the current machine.  If you have a char list, 
every single char in that list takes up three words- one word for the list 
element tag, one word for the next pointer, and the char itself takes up 
one word.  Likewise, if you have a char array, every element in the array 
takes up one whole word (this is why strings are not char arrays).  This 
allows Ocaml to share code- a function that handles a 'a array can now 
handle an array of chars, ints, floats, booleans, or foos.  If the type 
isn't unboxed (int, char, boolean) the array or list holds a reference to 
the type- which is still just a word.

The humorous thing is that C doesn't save as much as most people think it 
does in using smaller types- this is because pretty much all C compilers 
these days pad the data.  Accessing data that is aligned is signifigantly 
faster than accessing data that isn't aligned (and on many CPUs, you can't 
access misaligned data), so the C compiler inserts padding- unused bytes- 
to keep the data aligned.  For example, how large is the following 
structure on a 32-bit platform (ints are 4 bytes)?

struct foo {
    char c;
    int i;
};

You might say five bytes- four for the int and one for the char.  You'd be
wrong- the compiler will almost certainly add three bytes of padding
between c and i to keep i aligned- meaning the size of the structure is
actualy 8 bytes.  The char takes up a full four bytes all by it's 
lonesome.

Changing the order doesn't help.  Consider the following structure:

struct foo2 {
    int i;
    char c;
};

Now, the int doesn't follow the char.  The char can't be misaligned, so 
you don't need padding, do you?  Well, yes you still do need padding.  The 
C standard says the size of a structure will be padded out so that arrays 
of the structure are still aligned- effectively, that given a pointer p, 
the access:
    ((struct foo2 *) p)->i
to i is still aligned.  So again, the size of the structure is still 8 
bytes, and the char is still taking up a full four bytes.

Padding also shows up on local variables and function arguments in C.  
Consider the function:

void bar (char c) {
    char t;
    ...

How much memory does the argument c and the local variable t take up?  
Again- the compiler needs to keep the stack aligned, so variables and 
arguments get padded- both take up a full word.

If you have multiple variables of the same type, the shorter types do save 
some memory.  For example, this structure also only takes up two words of 
memory:
struct foo3 {
    int i;
    char c;
    char d;
};

But this requires you sort your variables, and happens less often than
people think.  This is why Ocaml isn't the memory hog a naive analysis
might make you think it is.

In nine years of professional C programming and 15 years of hobbyist 
programming, I have come to the conclusion that the main use of the 
various C int types- which, by the way, not only includes char, short, 
int, and long in both signed and unsigned varieties, but also size_t, 
ssize_t, off_t, ptrdiff_t, pid_t, etc.- is to introduce bugs by allowing 
you to pick the wrong int type.

So the question becomes- why do you need the other integer types?



-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners