Browse thread
How to read different ints from a Bigarray?
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Goswin von Brederlow <goswin-v-b@w...> |
| Subject: | Re: [Caml-list] Re: How to read different ints from a Bigarray? |
Sylvain Le Gall <sylvain@le-gall.net> writes:
> Hello,
>
> On 28-10-2009, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>> Hi,
>>
>> I'm working on binding s for linux libaio library (asynchron IO) with
>> a sharp eye on efficiency. That means no copying must be done on the
>> data, which in turn means I can not use string as buffer type.
>>
>> The best type for this seems to be a (int, int8_unsigned_elt,
>> c_layout) Bigarray.Array1.t. So far so good.
>>
>> Now I define helper functions:
>>
>> let get_uint8 buf off = buf.{off}
>> let set_uint8 buf off x = buf.{off} <- x
>>
>> But I want more:
>>
>> get/set_int8 - do I use Obj.magic to "convert" to int8_signed_elt?
>>
>> And endian correcting access for larger ints:
>>
>> get/set_big_uint16
>> get/set_big_int16
>> get/set_little_uint16
>> get/set_little_int16
>> get/set_big_uint24
>> ...
>> get/set_little_int56
>> get/set_big_int64
>> get/set_little_int64
>>
>> What is the best way there? For uintXX I can get_uint8 each byte and
>> shift and add them together. But that feels inefficient as each access
>> will range check and the shifting generates a lot of code while cpus
>> can usualy endian correct an int more elegantly.
>>
>> Is it worth the overhead of calling a C function to write optimized
>> stubs for this?
>>
>> And last:
>>
>> get/set_string, blit_from/to_string
>>
>> Do I create a string where needed and then loop over every char
>> calling s.(i) <- char_of_int buf.{off+i}? Or better a C function using
>> memcpy?
>>
>> What do you think?
>>
>
> Well, we talk about this a little bit, but here is my opinion:
> - calling a C function to add a single int will generate a big overhead
> - OCaml string are quite fast to modify values
>
> So to my mind the best option is to have a buffer string (say 16/32
> char) where you put data inside and flush it in a single C call to
> Bigarray.
>
> E.g.:
> let append_char t c =
> if t.idx >= 64 then
> (
> flush t.bigarray t.buffer;
> t.idx <- 0
> );
> t.buffer.(t.idx) <- c;
> t.idx <- t.idx + 1
>
> let append_little_uint16 t i =
> append_char t ((i lsr 8) land 0xFF);
> append_char t ((i lsr 0) land 0xFF)
>
>
> I have used this kind of technique and it seems as fast as C, and a lot
> less C coding.
>
> Regards,
> Sylvain Le Gall
This wont work so nicely:
- Writes are not always in sequence. I want to do a stream access
too where this could be verry effective. But the plain buffer is
more for random / known offset access. At a minimum you would have
holes for alignment.
- It makes read/write buffers complicated as you need to flush or peek
the string in case of uncommited changes. I can't do write-only
buffers as I want to be able to write a buffer and then add a
checksum to it in my application. The lib should not block that.
- The data is passed to libaio and needs to be kept alive and unmoved
as long as libaio knows it. I was hoping I could use the pointer to
the data to register/unregister GC roots without having to add a
another custom header and indirections.
I also still wonder how bad a C function call really is. Consider the
case of writing an int64.
Directly: You get one C call that does range check, endian convert and
write in one go.
Bffered: With your code you have 7 Int64 shifts, 8 Int64 lands, 8
conversions to int, at least one index check (more likely 8 to avoid
handling unaligned access) and 1/8 C call to blit the 64 byte buffer
string into the Bigarray.
MfG
Goswin
PS: Is a.{i} <- x a C call?