Browse thread
How can I treat bits?
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Mattias Waldau <mattias.waldau@a...> |
| Subject: | [Caml-list] Beware of compare (and Ocaml beaten by Java) |
Why isn't compare compiled as '-'? According to the definition
of compare this should be okay.
The reason why I am asking this can be found below.
/mattias
Suffix arrays are very efficient for fast string searching
in big texts. I implemented a first nice version, which
took 40 s to run on my P4, when applied to the bible (4.5 MB)
The core of the slow program is
(* compare two substrings of the SAME text
[compare x y] returns [0] if [x=y], a negative integer if
[x<y], and a positive integer if [x>y] *)
let rec same_substr_compare str idx1 idx2 : int =
let length = String.length str in
(* shortest string is smaller *)
if idx1 = length then -1 else
if idx2 = length then 1 else
(* compare one char *)
let res = compare str.[idx1] str.[idx2] in
(* char was equal, recurse *)
if res = 0 then same_substr_compare str (idx1+1) (idx2+1)
(* char was different, finished *)
else res ;;
(* create and sort index array for string *)
let create_index str : int array =
let idx = Array.init (String.length str) (fun ii -> ii) in
Array.stable_sort (same_substr_compare str) idx;
idx ;;
However, a colleague implemented it using Java. His code
only takes 15 s to run.
Thus, I had to further tune the code by replacing 'compare'
with '-', inlining better and removing bound-checking
on strings and arrays. The new version runs in 10 s.
During this lesson I learned that
1. compare is incredible slow! By replacing
let res = compare str.[idx1] str.[idx2] in
with
let res = (-) (int_of_char str.[idx1]) (int_of_char str.[idx2]) in
the run-time of the complete program went down
from 20 s to 10 s.
2. Tail-recursive comparation is faster than using a
loop and exit (runtime down from 80s to 60s). See the file
other_compares.ml in the nice version for these other attempts,
one example is
exception Result of int ;;
let substr_compare str idx1 idx2 : int =
let res = compare str.[idx1] str.[idx2] in
if res <> 0 then res
else try
let length = String.length str in
let max_size = if idx1 > idx2 then
length - idx1
else length - idx2 in
for offset = 1 to max_size - 1 do
let res = compare str.[idx1+offset] str.[idx2+offset] in
if res<>0 then raise (Result res);
done;
0
with Result res -> res ;;
3. Mergesort (=Array.stable_sort) is faster than
heapsort (=Array.sort). (runtime down from 60s to 40s).
(I also tried quicksort (=Sort.array), but after 8 hours
it still hadn't finished.)
4. The rest is inline and removing bounds checking.
This is the first time I got beaten by Java, I will
have to reevaluate Java. Note that Java still does
bounds checking!
/mattias
The nice version is at
http://www.abc.se/~m10217/download/mans.tar.bz2
The fast version is at
http://www.abc.se/~m10217/download/mans.opt.tar.bz2
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr