Browse thread
How to read different ints from a Bigarray?
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Goswin von Brederlow <goswin-v-b@w...> |
| Subject: | Re: [Caml-list] Re: How to read different ints from a Bigarray? |
Mauricio Fernandez <mfp@acm.org> writes:
> On Mon, Nov 02, 2009 at 05:11:27PM +0100, Goswin von Brederlow wrote:
>> Richard Jones <rich@annexia.org> writes:
>>
>> > On Sun, Nov 01, 2009 at 04:11:52PM +0100, Goswin von Brederlow wrote:
>> >> But C calls are still 33% slower than direct access in ocaml (if one
>> >> doesn't use the polymorphic functions).
>> >
>> > Are you using noalloc calls?
>> >
>> > http://camltastic.blogspot.com/2008/08/tip-calling-c-functions-directly-with.html
>>
>> Yes. And I looked at the bigarray module and couldn't figure out how
>> they differ from my own external function. Only difference I see is
>> the leading "%" on the external name. What does that do?
>
> That means that it is using a hardcoded OCaml primitive, whose code can be
> generated by the compiler via C--. See asmcomp/cmmgen.ml.
>
>> > I would love to see inline assembler supported by the compiler.
>
> It might be possible to hack support for C-- expressions in external
> declarations. That'd be a sort of portable assembler.
This brings me a lot closer to a fast buffer structure. I know have
this code:
(* buffer.ml: Buffer module for libaio-ocaml
* Copyright (C) 2009 Goswin von Brederlow
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as
* published by the Free Software Foundation, either version 3 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
* Under Debian a copy can be found in /usr/share/common-licenses/LGPL-3.
*)
open Bigarray
type buffer = (int, int8_unsigned_elt, c_layout) Array1.t
exception Unaligned
let create size = (Array1.create int8_unsigned c_layout size : buffer)
let unsafe_get_uint8 (buf : buffer) off = Array1.unsafe_get buf off
let unsafe_get_uint16 (buf : buffer) off =
let off = off asr 1 in
let buf = ((Obj.magic buf) : (int, int16_unsigned_elt, c_layout) Array1.t)
in
Array1.unsafe_get buf off
let unsafe_get_int31 (buf : buffer) off =
let off = off asr 2 in
let buf = ((Obj.magic buf) : (int32, int32, c_layout) Array1.t) in
let x = Array1.unsafe_get buf off
in
Int32.to_int x
let unsafe_get_int63 (buf : buffer) off =
let off = off asr 3 in
let buf = ((Obj.magic buf) : (int, int, c_layout) Array1.t)
in
Array1.unsafe_get buf off
Looking at the generated code I see that this works nicely for 8 and
16bit:
0000000000404a50 <camlBuffer__unsafe_get_uint8_131>:
404a50: 48 d1 fb sar %rbx
404a53: 48 8b 40 08 mov 0x8(%rax),%rax
404a57: 48 0f b6 04 18 movzbq (%rax,%rbx,1),%rax
404a5c: 48 8d 44 00 01 lea 0x1(%rax,%rax,1),%rax
404a61: c3 retq
0000000000404a90 <camlBuffer__unsafe_get_uint16_137>:
404a90: 48 d1 fb sar %rbx
404a93: 48 83 cb 01 or $0x1,%rbx
404a97: 48 d1 fb sar %rbx
404a9a: 48 8b 40 08 mov 0x8(%rax),%rax
404a9e: 48 0f b7 04 58 movzwq (%rax,%rbx,2),%rax
404aa3: 48 8d 44 00 01 lea 0x1(%rax,%rax,1),%rax
404aa8: c3 retq
But for 31/63 bits I get:
0000000000404b90 <camlBuffer__unsafe_get_int31_145>:
404b90: 48 83 ec 08 sub $0x8,%rsp
404b94: 48 c1 fb 02 sar $0x2,%rbx
404b98: 48 83 cb 01 or $0x1,%rbx
404b9c: 48 89 c7 mov %rax,%rdi
404b9f: 48 89 de mov %rbx,%rsi
404ba2: 48 8b 05 5f bc 21 00 mov 0x21bc5f(%rip),%rax # 620808 <_DYNAMIC+0x7e0>
404ba9: e8 92 2a 01 00 callq 417640 <caml_c_call>
404bae: 48 63 40 08 movslq 0x8(%rax),%rax
404bb2: 48 d1 e0 shl %rax
404bb5: 48 83 c8 01 or $0x1,%rax
404bb9: 48 83 c4 08 add $0x8,%rsp
404bbd: c3 retq
0000000000404ca0 <camlBuffer__unsafe_get_int63_154>:
404ca0: 48 83 ec 08 sub $0x8,%rsp
404ca4: 48 c1 fb 03 sar $0x3,%rbx
404ca8: 48 83 cb 01 or $0x1,%rbx
404cac: 48 89 c7 mov %rax,%rdi
404caf: 48 89 de mov %rbx,%rsi
404cb2: 48 8b 05 4f bb 21 00 mov 0x21bb4f(%rip),%rax # 620808 <_DYNAMIC+0x7e0>
404cb9: e8 82 29 01 00 callq 417640 <caml_c_call>
404cbe: 48 83 c4 08 add $0x8,%rsp
404cc2: c3 retq
At least in the int63 case I would have thought the compiler would
emit asm code to read the int instead of a function call. In the 31bit
case I would have hoped it would optimize the intermittend int32 away.
Is there something I can do better to get_int31? I was hoping for code
like this:
0000000000404a90 <camlBuffer__unsafe_get_uint31_137>:
404c90: 48 c1 fb 03 sar $0x3,%rbx
404a94: 48 83 cb 01 or $0x1,%rbx
404a98: 48 d1 fb sar %rbx
404a9b: 48 8b 40 08 mov 0x8(%rax),%rax
404a9f: xx xx xx xx xx movzwq (%rax,%rbx,4),%rax
404aa4: 48 8d 44 00 01 lea 0x1(%rax,%rax,1),%rax
404aa9: c3 retq
MfG
Goswin