Browse thread
Threads performance issue.
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2009-02-17 (09:49) |
From: | Jon Harrop <jon@f...> |
Subject: | Re: [Caml-list] Threads performance issue. |
On Tuesday 17 February 2009 08:59:44 Mark Shinwell wrote: > On Tue, Feb 17, 2009 at 08:40:11AM +0100, Rémi Dewitte wrote: > > I have made some further experiments. > > I have a functional version of the reading algorithm. I have the original > > imperative version of the algorithm. > > Either it is linked to thread (T) or not (X). Either it uses extlib (E) > > or not (X). > > Using: > > ocamlopt -o foo transf.ml > ocamlopt -thread -o foothread transf.ml unix.cmxa threads.cmxa > > on local disk with a 24Mb, approx. 500,000-line CSV file, I only see a > minor slowdown with foothread as opposed to foo. (A minor slowdown would > inded be expected.) So I'm confused as to why your results are so > different. If you uncomment Rémi's line of code that requires threading (or factor it out into a different file and compile that as well when using threading) then you should be able to reproduce his results as I have. > You could use ocamlopt -p and run gprof on the resulting gmon.out file. Indeed the profiling results are most enlightening. Without threads everything looks great: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 38.89 0.07 0.07 3 23.33 60.00 camlTransf__loop_109 22.22 0.11 0.04 2873940 0.00 0.00 caml_ml_input_char 11.11 0.13 0.02 2873940 0.00 0.00 camlTransf__read_char_106 11.11 0.15 0.02 381696 0.00 0.00 camlTransf__code_begin ... With threads, everything looks awful: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 22.22 0.08 0.08 caml_thread_leave_blocking_section 16.67 0.14 0.06 caml_thread_enter_blocking_section 13.89 0.19 0.05 3 16.67 50.00 camlTransf__loop_109 9.72 0.23 0.04 caml_io_mutex_unlock 5.56 0.24 0.02 3255698 0.00 0.00 caml_c_call ... So most of the time is spent locking and unlocking OCaml's good old giant global lock. Unfortunately the blocking calls are "spontaneous" so you cannot track them down using a profile but, given the huge rise in time spent in read_char, my guess is that OCaml has introduced an OS kernel lock around every single byte read! If so, I'm surprised it is running *this* fast... Rémi: assuming I am correct, I'd recommend using memory mapping (and rewriting that code ;-). -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e