Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Concurrent hash tables
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-02-01 (15:22)
From: David Allsopp <dra-news@m...>
Subject: Concurrent hash tables
I've been experimenting with a concurrent hash table implementation under
Windows XP based on some of the ideas from the util.concurrent library in
Java. I've hit some confusing results and am wondering whether someone more
fluent with the internal workings of O'Caml threads can explain them.

My trivial hash table implementation uses an array of lists to store pairs
for the (key, value) mappings. O'Caml's own hashing primitive is used for
the hashes. The hash table is set up with a user-specified number of locks
(so, in order to write to a particular hash bucket, you must possess the
lock for n mod nLocks). In the first attempt I made, the hash table is never
resized, so the locking cases for reading are never hit and there are no
conflicts. The interesting results come with the set operations to update
values in the hash table. The threads spawned in each test cycle 500000
random operations. There are 16 of them and each thread executes the same
operations as it did in the previous test. Approximately 10% of these
operations are set, the rest are get (with the result being ignored).

My set code looks like this:

let set {buckets = buckets; locks = locks} key value =
  let hash = Hashtbl.hash key mod Array.length buckets
    let lock = locks.(hash mod Array.length locks)
      Mutex.lock lock;
      (* Very poor function to update the buckets... *)
      let rec f acc elts =
        match elts with
          [] -> (key, value)::acc
        | (key', value')::elts -> if key' = key
                                  then (key, value)::acc @ elts (* :o) *)
                                  else f ((key', value')::acc) elts
        buckets.(hash) <- f [] buckets.(hash);
        Mutex.unlock lock

For 16 threads operating on a single hash table with one lock for all
writes, this takes about 20 seconds. For 16 threads with 8 locks, it
consistently takes about 30... which seems a bit counterintuitive! The
garbage collector does not seem to be causing trouble - I do a Gc.compact()
in between each test to try and a combination of monitoring with
Gc.create_alarm and test re-ordering has shown that it doesn't seem to be
getting in the way of the results.

If I replace the Mutex.lock line with

if not(Mutex.try_lock lock)
then Mutex.lock lock;

then the two versions come down to roughly the same speed (though the 8 lock
version is still never faster). Further inspection shows that the
concurrency has definitely improved in the 8-lock case --- there are
approximately 800k - 1 million locking conflicts in the 1-lock case and
between 10k - 100k conflicts in the 8-lock case.

Which leads to two questions:

1. Why is the Mutex.try_lock lock version faster (discussion follows)
2. Why does increasing the concurrency seem to hurt the performance?

Looking in Win32.c for the implementation of try_lock and lock, the
difference seems to be the padding put around the call to
WaitForSingleObject --- it has to call caml_enter_blocking_section and so on
(I'm assuming that the Begin_root call is what would normally be done
"under-the-hood" by CAMLparam1). Would a better implementation of Mutex.lock
therefore be to call WaitForSingleObject with a 0 timeout and then try with
the INFINITE version so reducing the overhead of releasing and reacquiring
the OCaml global mutex in the successful case?

Any explanations much appreciated --- I'm aware that since SMP isn't
supported, highly concurrent implementations of structures such as this
aren't really that useful but I'm intrigued by the performance hit of
increasing concurrency on this operation! I blush as I say that I've never
tried code profiling before so don't really know where to start using that
to see if would help explain things further...