Re: camlex/camlyacc + threads problem

From: Xavier Leroy (
Date: Thu Sep 25 1997 - 14:19:33 MET DST

From: Xavier Leroy <>
Message-Id: <>
Subject: Re: camlex/camlyacc + threads problem
In-Reply-To: <> from Jean-Claude Laffitte at "Sep 23, 97 02:54:07 pm"
To: (Jean-Claude Laffitte)
Date: Thu, 25 Sep 1997 14:19:33 +0200 (MET DST)

> I make an intensive use of threads and I have an intensive need of safety. So
> the question is, what means *thread-safe* ?

To quote Dave Butenhof, "Programming with Posix threads":

        ``Thread-safe'' means that the code can be called from
        multiple threads without destructive results. It does not
        require that the code run efficiently in multiple threads,
        only that it can operate safely in multiple threads.

> Is it just a problem whith global values ( critical section ), or do I only
> use the pervasive library in my threads ?
> For example, is this code safe ? :
> let crazy name =
> let counter = 0 in
> while ( true ) do
> let str = String.create 30 (* safe ? *)
> and arr = Array.create 10 a in (* safe ? *)
> Printf.sprintf str "%s : %d" name counter (* safe ? *)
> done
> let main =
> begin
> let t1 = Thread.create crazy "First"
> and t2 = Thread.create crazy "second"
> and t3 = Thread.create crazy "Third" in
> Thread.join (t3);
> end

Yes, it is. (Though it does not typecheck.)

> Do I need mutexes for all the Arrays, Lists, Strings, Stacks ... I use ??

There are essentially four kinds of data structures / library functions:

1- Purely functional data structures and functions (no side effects):
   these are always thread-safe. The module List is a good example.
   No need for mutexes.

2- Basic functions over mutable data structures, e.g. reading and writing
   references, or elements of arrays or strings, or Array.create,
   or String.create. These are atomic operations, meaning that if two
   threads A and B assign the same array element, then either
   A will assign it, then B, or B, then A,
   but no weird behavior will occur that might cause the array element
   to hold a value other than that stored by A or that stored by B.
   I/O functions from Pervasive also fall in this class.
   You don't have to protect these structures with a mutex, though
   there are many cases where you will want to, e.g. for guaranteeing
   that a sequence of assignments over a shared array
   are performed atomically.

3- More complicated functions over mutable data structures, e.g.
   Array.copy, or the functions in Hashtable, Stack, Lexing:
   modifications on these data structures are not atomic, so if two threads
   modify the same structure concurrently, internal invariants may be broken
   and unexpected results ensue. You should always associate these data
   structures with a mutex, or make sure they are used in only one thread.

4- Functions with internal global state. The only example in the whole
   standard library is the Parsing module. Here, it's unsafe to call
   one of these functions from two threads simultaneously, even on
   different arguments (e.g. different Lexing.lexbuf arguments for
   a parser). Use a global mutex or make sure only one thread does

Of all these functions, class 4 is the most troublesome, and I expect
to get entirely rid of it for the next release of O'Caml (e.g. by putting
the parsing state inside the Lexing.lexbuf argument).

Class 3 could be made thread-safe by adding mutexes inside the library
modules, but this is problematic. For instance, it is impractical to
associate a mutex with each array or string. It also imposes
significant overhead on the standard library, especially in the
non-threaded case. Finally, it often makes more sense to use a mutex in
the user code to protect a group of related standard library data
structures (e.g. a hashtable and an array), rather than rely on
fine-grained locking in the library itself.

Hope this clarifies the issues. Nobody said multithreaded programming
was easy...

- Xavier Leroy

This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:12 MET