Version française
Home     About     Download     Resources     Contact us    
Browse thread
How to handle endianness and binary string conversion for 32 bits integers (Int32)?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Nicolas George <nicolas.george@e...>
Subject: Re: [Caml-list] How to handle endianness and binary string conversion for 32 bits integers (Int32)?
L'octidi 28 prairial, an CCXIII, David MENTRE a écrit :
>  1. convert between big and little endian 32 bits integers;

Don't do that.

>  2. convert between 32 bits integers and string binary representation
>     (to store integers in Buffer and string data structures);

What you mean to do is represent an integer in a bounded interval as a
fixed-length sequence of finite-valued objects. Said that way, children
learn how to do it in school: it's writing the number in some base. Since
bytes in a string can take 256 values, one will obviously use base 256.

The first (rightmost) "digit" will be (n mod 256).
The second "digit" will be ((n / 256) mod 256).
The third "digit" will be ((n / (256 * 256)) mod 256)
The fourth (leftmost) "digit" will be ((n / (256 * 256 * 256)) mod 256).

And so on, but since your numbers are less than 256*256*256*256, all
remaining "digits" are 0. So all you have to do is store these four bytes in
your string, in any order you may prefer.

"Big endian" is when you store the fourth, the third, the second and the
first; it is the nearest to the way we humans write numbers; and the lexical
order is the same as the numeric order. "Small endian" is when you store the
first, the second, the third and the fourth.

But, and that is important, this does not depend on the hardware it runs on:
it is purely arithmetic.

The reverse operation is simply

n = d1 + d2 * 256 + d3 * 256 * 256 + d4 * 256 * 256 * 256

>  3. detect machine endianness at runtime.

Don't do that. I develop: there are no guarantees that numbers are either in
big or little endian. I have heard that some architectures exist where
8-bits bytes in 16-bits words are in little endian, but 16-bits words in
32-bit words are in big endian, which gives 3412 as a global order.

Using the internal representation of integers can so never be reliable. On
the contrary, compilers ensure that arithmetic in reasonable interval is the
real Peano arithmetic, for all architectures.

Using the internal representation of numbers may allow to gain some cycles
on the packing-unpacking, but it is probably nothing in regard to anything
that will be done with the data (disc access or network for example).
Furthermore, if you have to worry about inverting the order of the bytes in
the number, the gain will be even smaller.