Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000025OCamlOCaml generalpublic2000-01-14 23:052000-02-07 14:46
Reporteradministrator 
Assigned To 
PrioritynormalSeverityminorReproducibilityalways
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0000025: threads and marshalling with I/O
DescriptionFull_Name: Gerd Stolpmann
Version: 2.04 -with-pthread
OS: Linux-2.2.13 with glibc-6.1
Submission from: master.proxy.ision.net (195.180.208.40)


Marshalling, including the input/output_value functions in Pervasives, is not
reentrant at all. This very clear from the C source code in byterun/extern.c
and
intern.c:

extern.c: There is a static variable extern_block which is actually the buffer
that is written in do_write; writing happens in a "blocking section" and so
concurrent Marshal.to_channel calls can lead to situations where more than one
thread is executing the code within this section with the same extern_block
buffer.

intern.c: The same problems with the buffer intern_input which is also filled
in a blocking section.

This piece of code often segfaults with Marshal.to_channel:

let l = ref [] in
let s = String.make 10000 'x' in
for i = 1 to 50 do
  let id =
    Thread.create
      (fun () ->
     for j = 1 to 100 do
       let ch = open_out "/dev/null" in
       Marshal.to_channel ch s [];
       close_out ch;
     done;
      )
      ()
  in
  l := id :: !l
done;

List.iter
  (fun id ->
     Thread.join id)
  !l
;;

The same with Marshal.from_channel:

let s = String.make 10000 'x' in
let out = open_out "r.data" in
Marshal.to_channel out s [];
close_out out;
let l = ref [] in
for i = 1 to 50 do
  let id =
    Thread.create
      (fun () ->
     for j = 1 to 100 do
       let ch = open_in "r.data" in
       ignore(Marshal.from_channel ch);
       close_in ch;
     done;
      )
      ()
  in
  l := id :: !l
done;

List.iter
  (fun id ->
     Thread.join id)
  !l
;;


TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0000297)
administrator (administrator)
2000-01-17 13:37

> Marshalling, including the input/output_value functions in Pervasives, is not
> reentrant at all. This very clear from the C source code in byterun/extern.c
> and intern.c:
> [global variables]

Please keep in mind that the two thread libraries for Caml are set up in such a
way
that only one thread at a time is in Caml code or in most parts of the runtime
system. For native threads, this is achieved by having a master lock that
any thread must acquire before running Caml code, and releases only when it
is about to perform a blocking I/O operation. While the main motivation for
the master lock is that the allocator and GC aren't concurrent, the master lock
also protects a bunch of global variables such as those in extern.c and
intern.c.

Marshalling is a two-step process: first, the stream of bytes is built in
memory,
without releasing the master lock; then, it is written on the channel,
releasing
the master lock for the duration of the (potentially blocking) write. But the
global variables are used only during the first step (I think -- I'll have to
check the code again!).

> This piece of code often segfaults with Marshal.to_channel:

I'll look at that. One quick question: is this with bytecode threads or native
threads?

Best regards,

- Xavier Leroy
(0000298)
administrator (administrator)
2000-01-17 20:38

On Mon, 17 Jan 2000, you wrote:
>> Marshalling, including the input/output_value functions in Pervasives, is not
>> reentrant at all. This very clear from the C source code in byterun/extern.c
>> and intern.c:
>> [global variables]
>
>Please keep in mind that the two thread libraries for Caml are set up in such a
>way
>that only one thread at a time is in Caml code or in most parts of the runtime
>system. For native threads, this is achieved by having a master lock that
>any thread must acquire before running Caml code, and releases only when it
>is about to perform a blocking I/O operation. While the main motivation for
>the master lock is that the allocator and GC aren't concurrent, the master lock
>also protects a bunch of global variables such as those in extern.c and
>intern.c.

I already took the master lock into account. The buffers that are used for
the "read" and "write" system calls are not protected (enough). This is very
clear from "output_val":

void output_val(struct channel *chan, value v, value flags)
{
  long len;
  if (! channel_binary_mode(chan))
    failwith("output_value: not a binary channel");
  alloc_extern_block();
  len = extern_value(v, flags);
  really_putblock(chan, extern_block, len);
  stat_free(extern_block);
}

Although "extern_block" is copied into the channel's private buffer, it is still
possible that several threads use "extern_block" in parallel. Why that? Consider
thread A and thread B want to "output_val" almost at the same time. Now this
can happen:

- thread A runs until the "write" system call is reached (somewhere in
  really_putblock). The master lock is released at this time.
- "write" blocks for a moment
- thread B can now continue, do "alloc_extern_block" again although the previous
  buffer is still in use
- thread B does not block; it frees extern_block
- thread A wakes up and tries to free extern_block again. Crash.

A possible solution would be to free extern_block *immediately* after it has
been copied to the channel buffer, and before the master lock will be released.

The same problem applies to "input_val", because "intern_input" can be used by
several threads in parallel, too. Here, the solution is very simple: do not
use the global variable "intern_input" while "really_getblock" is in progress
but a local pointer. Assign this local pointer to "intern_input" after the input
operation is over.

There is another problem with input_val:

value input_val(struct channel *chan)
{
    ...
        intern_input = (unsigned char *) stat_alloc(block_len);
    ...
        if (really_getblock(chan, (char *)intern_input, block_len) == 0) {
        ...
    }
    ...
}

really_getblock simply does getblock multiple times with the same buffer.

int getblock(struct channel *channel, char *p, long int len)
{
    ...
      if (n <= avail) { ...
    } else if (avail > 0) { ...
    } else if (n< IO_BUFFER_SIZE) { ...
    } else {
              nread = do_read(channel->fd, p, n); /* WRONG */
              channel->offset += nread;
              return nread;
      }
}

"do_read" then passes "p" over to "read", with released master lock.

This may also be the reason for the Pervasives.input bug I reported, too,
because "caml_input" passes a Caml-allocated string to "getblock". This
string can be corrupted by a garbage collection at the same time.

>I'll look at that. One quick question: is this with bytecode threads or native
>threads?

Native threads, of course.

Gerd
--
----------------------------------------------------------------------------
Gerd Stolpmann Telefon: +49 6151 997705 (privat)
Viktoriastr. 100
64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany
----------------------------------------------------------------------------

(0000299)
administrator (administrator)
2000-02-07 14:41

> I already took the master lock into account. The buffers that are used for
> the "read" and "write" system calls are not protected (enough). This is very
> clear from "output_val":
> Although "extern_block" is copied into the channel's private buffer, it is
still
> possible that several threads use "extern_block" in parallel.

You are correct, "stat_free(extern_block)" in output_val isn't protected by
the master lock.

> A possible solution would be to free extern_block *immediately* after it has
> been copied to the channel buffer, and before the master lock will be
released.

Unfortunately, the channel buffer doesn't always have enough free room to copy
extern_block in it at once.

My fix is to save the value of extern_block (the address of the extern buffer)
in a local variable before calling really_putblock, then pass that local
variable
to stat_free at the end.

> The same problem applies to "input_val", because "intern_input" can be used
by
> several threads in parallel, too. Here, the solution is very simple: do not
> use the global variable "intern_input" while "really_getblock" is in progress
> but a local pointer. Assign this local pointer to "intern_input" after the
input
> operation is over.

Right. (My fix above is symmetrical). This will be in 3.00.

> [In getblock:]
> "do_read" then passes "p" over to "read", with released master lock.
> This may also be the reason for the Pervasives.input bug I reported, too,
> because "caml_input" passes a Caml-allocated string to "getblock". This
> string can be corrupted by a garbage collection at the same time.

Correct. There was a similar bug in Unix.read and Unix.write that was fixed a
while ago, but I didn't realize the same problem occurred in buffered I/O as
well.

I have fixed this by suppressing the optimization in getblock (that we read
directly into the Caml string, bypassing entirely the buffer, if the buffer is
empty
and the request large enough). In other terms, all data read goes first
through
the buffer (which doesn't move during GC), then is copied back to the Caml
string
while the master lock is held. It's slightly less efficient (all reads are now
decomposed into 4K chunks) but I don't see any other way to remain GC-safe.

Thanks for the bug reports and diagnostics,

- Xavier Leroy

>
>>I'll look at that. One quick question: is this with bytecode threads or
native
>>threads?
>
> Native threads, of course.
>
> Gerd
> --
> ----------------------------------------------------------------------------
> Gerd Stolpmann Telefon: +49 6151 997705 (privat)
> Viktoriastr. 100
> 64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat)
> Germany
> ----------------------------------------------------------------------------
>
>
(0000300)
administrator (administrator)
2000-02-07 14:46

Fixed in 3.00

- Issue History
Date Modified Username Field Change
2005-11-18 10:13 administrator New Issue


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker