Browse thread
[Caml-list] c is 4 times faster than ocaml?
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | John Prevost <j.prevost@g...> |
| Subject: | Re: [Caml-list] c is 4 times faster than ocaml? |
Just from a first look, I'd say that there are two likely reasons that this artificial (and incredibly hard to read) benchmark program performs poorly: First, even with -unsafe, bounds checking is performed on BigArray types. Second, working with single byte values is likely painful here--O'Caml always works with word-aligned values, so it's going to lose bigtime. gcc, on the other hand, knows that the crazy intel instruction set can handle non-word-aligned values. Here's the main setting loop from gcc: .L84: movb %al, (%eax,%ecx) incl %eax cmpl %edx, %eax jl .L84 And here it is from O'Caml: .L109: movl %esi, %ecx ;; grab our index into %ecx sarl $1, %ecx ;; shift off the tag bit movl 20(%eax), %edx ;; get the array's length into %edx cmpl %ecx, %edx ;; compare the two jbe .L111 ;; if the index is too high, punt movl 4(%eax), %edi ;; ? probably figure which byte in word movl %esi, %edx ;; load the loop value into %edx sarl $1, %edx ;; shift off the low bit movb %dl, (%edi, %ecx) ;; shove %edx's byte into the word movl %esi, %ecx ;; store back into array addl $2, %esi ;; add 1 to index cmpl %ebx, %ecx ;; compare to target jne .L109 ;; not equal? loop That jbe .L111 is what happens if a bounds check fails, by the way! Anyway, you can see that the bounds check takes a bunch of instructions. THe main loop is also a bit more expensive. One thing going on is those "sarl" instructions, which are shifting out the tag bit on the right end of O'Caml integers. If you were working on integers instead, I think it might be less painful. Especially if you could use int32s held in registers to index into things. Anyway, the main two things slowing stuff down here are the bounds check and the fact that O'Caml needs to do so much work turning caml integers into c integers. (Just as a note, I accidentally tweaked your file to make the loops not know the type of their arguments at one point while looking for this loop--you *always* want exact types known at a deep level for this kind of thing, as that made ocamlopt use C calls to access the array.) Oh--and ignore my old ramblings on mmap stuff. That code was bad then, and is worse now. :) As for your project, I suspect we could provide better suggestions on how to optimize if we were looking at real code. My suspicion is that you might want to write one or two low-level routines in C, rather than using Bigarrays for this task. (Just assuming, though--from the sound of it you're going to have larger structured data in the mmap'd areas.) Good luck! ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners