Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] Interesting optimization
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: John Prevost <j.prevost@g...>
Subject: Re: [Caml-list] Interesting optimization
update''' is a bit slower than update'' is on my system--midway
between the two reference versions:

using update''':
real    0m12.940s
user    0m12.930s
sys     0m0.010s

Just for reference, here's the generated intel assembly for the loop
part of each version:

        andl    $511, %esi
        movl    %ecx, %edx
        addl    $2, %ecx
        movl    $2147483647, %ebx                    (1)
        cmpl    %ebx, %edx
        jne     .L106

        movl    (%eax), %ebx
        andl    $511, %ebx
        movl    %ebx, (%eax)
        movl    %ecx, %edx
        addl    $2, %ecx
        movl    $2147483647, %ebx
        cmpl    %ebx, %edx
        jne     .L109

        cmpl    $1, %ebx
        jl      .L100
        addl    $2, %ebx
        andl    $511, %eax
        jmp     .L101

        andl    $511, %eax
        movl    $2147483647, %ecx
        cmpl    %ecx, %ebx
        jge     .L102
        addl    $2, %ebx
        jmp     .L103

Looking at the above, and comparing to two more versions:

let update'4 c =
  let rec loop c' a =
    if a = max_int
      then c'
      else loop (c' land 0xff) (succ a)
  c := (loop !c 0)

        movl    $2147483647, %ecx
        cmpl    %ecx, %ebx
        jne     .L104
        .align  16
        addl    $2, %ebx
        andl    $511, %eax
        jmp     .L105

let update'5 c =
  let c' = ref !c in
  for n = max_int downto 0 do
    c' := !c' land 0xff
  c := !c'

        andl    $511, %edx
        movl    %ebx, %ecx         (3)
        subl    $2, %ebx
        cmpl    $1, %ecx           (2)
        jne     .L108

which has about the same performance as loop''', I feel justified in
saying the following:

The main interest here is what Daniel Bünzli originally noted: if a
reference is defined and then used in the local scope of a function,
you avoid ever actually working with the reference itself, which
removes that level of overhead.

Everything else is really all about squeezing the last ounce of
performance out, by as little as a single instruction.  There's really
not much profit in this, unless the loop in question is very important
and is known to be a bottleneck.  In fact, don't read anything more I
say unless you're really interested in that--unless you plan to look
at your own assembly code by hand, it's better to trust the compiler
than to second-guess it.

Specific notes from the above and from a couple of other exploratory functions:

a) if ... then ... else ... expressions bounding a tail loop should
have the result case in the "else" clause.  Putting the result case in
the "then" clause results in a little skip.  In essence, the compiler
assumes that the true case is the expected case.

b) match expressions seem to be man-handled more by the compiler.  You
can still predict what's going to happen, however.  As a general rule,
pattern matching works well when at least one case has an actual value
to match against.  If every pattern is a don't care and some have
guards, expect it to behave somewhat like the "if" bit above: the
first case is assumed to be the expected case.  When it matters and
you're not sure, look at the assembly.

c) as a special warning:

    match e with true -> e1 | false -> e2

is much less efficient than

    if e then e1 else e2

In short, if you know which branch is more expected, put it in the
"then" clause.  And trust that pattern matching is pretty smart unless
you do something really silly (like case c, or like using only guarded
don't-care patterns.)


To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: