Version française
Home     About     Download     Resources     Contact us    
Browse thread
ocaml limitations
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: John Carol Langford <jcl@g...>
Subject: ocaml limitations
I have been encountering some fundamental limitations of the ocaml
language and compiler that are killing my performance - to the tune of
a factor of 10 off equivalent C code.  This is a serious problem
because the program I'm working on is both RAM and cpu intensive.

The performance problems from two limitations.  The first is in the
compiler and runtime - it's the limitation on the array size on 32 bit
machines - I only have linux PC's available to work on.  It appears
that the garbage collector needs some type information to work with
arrays and enough bits are set aside for type information that not
enough bits are allowed to specify a large array size.  You can get
around this large array size problem by simulating a large array with
an array of arrays, but there is a significant performance penalty.

The second problem is a language failure - there is no 'short int'
type in ocaml.  Due to the combinatorics of my problem it would be
very convenient to use 16 bit integers.  Using 32 bit integers instead
doubles the footprint of the program - which is unacceptable in this
case.  Consequently, I simulate 16 bit integers using masking games -
which again incurs a performance penalty.  

These two problems together add up to using a function considerably
more complicated then an array dereference on the inner loop:

let get_first i = i land (num_array -1) 
let get_second i = i lsr (log_num_array+1)
let get_third i = (i lsr log_num_array) land 1

let get_split split i = 
  let first_index = get_first i 
  and second_index = get_second i 
  and third_index = get_third i in
  let ret = split.(first_index).(second_index) in
  let real_ret = 
    if third_index = 1 then ret
    else ret lsr 16 in
  real_ret land 65535

Naturally, the performance (relative to what the hardware is capable
of) is terrible.  Consequently, I'm wondering if there are plans to
remove either (or both) of these limitations in the near future and
lacking that if there are better workarounds.  Any suggestions?

-John