Version française
Home     About     Download     Resources     Contact us    
Browse thread
Threads performance issue.
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jon Harrop <jon@f...>
Subject: Re: [Caml-list] Threads performance issue.
On Tuesday 17 February 2009 08:59:44 Mark Shinwell wrote:
> On Tue, Feb 17, 2009 at 08:40:11AM +0100, Rémi Dewitte wrote:
> > I have made some further experiments.
> > I have a functional version of the reading algorithm. I have the original
> > imperative version of the algorithm.
> > Either it is linked to thread (T) or not (X). Either it uses extlib (E)
> > or not (X).
>
> Using:
>
> ocamlopt -o foo transf.ml
> ocamlopt -thread -o foothread transf.ml unix.cmxa threads.cmxa
>
> on local disk with a 24Mb, approx. 500,000-line CSV file, I only see a
> minor slowdown with foothread as opposed to foo.  (A minor slowdown would
> inded be expected.)  So I'm confused as to why your results are so
> different.

If you uncomment Rémi's line of code that requires threading (or factor it out 
into a different file and compile that as well when using threading) then you 
should be able to reproduce his results as I have.

> You could use ocamlopt -p and run gprof on the resulting gmon.out file.

Indeed the profiling results are most enlightening. Without threads everything 
looks great:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 38.89      0.07     0.07        3    23.33    60.00  camlTransf__loop_109
 22.22      0.11     0.04  2873940     0.00     0.00  caml_ml_input_char
 11.11      0.13     0.02  2873940     0.00     0.00  
camlTransf__read_char_106
 11.11      0.15     0.02   381696     0.00     0.00  camlTransf__code_begin
...

With threads, everything looks awful:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 22.22      0.08     0.08                             
caml_thread_leave_blocking_section
 16.67      0.14     0.06                             
caml_thread_enter_blocking_section
 13.89      0.19     0.05        3    16.67    50.00  camlTransf__loop_109
  9.72      0.23     0.04                             caml_io_mutex_unlock
  5.56      0.24     0.02  3255698     0.00     0.00  caml_c_call
...

So most of the time is spent locking and unlocking OCaml's good old giant 
global lock. Unfortunately the blocking calls are "spontaneous" so you cannot 
track them down using a profile but, given the huge rise in time spent in 
read_char, my guess is that OCaml has introduced an OS kernel lock around 
every single byte read! If so, I'm surprised it is running *this* fast...

Rémi: assuming I am correct, I'd recommend using memory mapping (and rewriting 
that code ;-).

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e