threads and marshalling with I/O #2359

vicuna · 2000-01-14T22:05:31Z

Original bug ID: 25
Reporter: administrator
Status: closed
Resolution: fixed
Priority: normal
Severity: minor
Category: ~DO NOT USE (was: OCaml general)

Bug description

Full_Name: Gerd Stolpmann
Version: 2.04 -with-pthread
OS: Linux-2.2.13 with glibc-6.1
Submission from: master.proxy.ision.net (195.180.208.40)

Marshalling, including the input/output_value functions in Pervasives, is not
reentrant at all. This very clear from the C source code in byterun/extern.c
and
intern.c:

extern.c: There is a static variable extern_block which is actually the buffer
that is written in do_write; writing happens in a "blocking section" and so
concurrent Marshal.to_channel calls can lead to situations where more than one
thread is executing the code within this section with the same extern_block
buffer.

intern.c: The same problems with the buffer intern_input which is also filled
in a blocking section.

This piece of code often segfaults with Marshal.to_channel:

let l = ref [] in
let s = String.make 10000 'x' in
for i = 1 to 50 do
let id =
Thread.create
(fun () ->
for j = 1 to 100 do
let ch = open_out "/dev/null" in
Marshal.to_channel ch s [];
close_out ch;
done;
)
()
in
l := id :: !l
done;

List.iter
(fun id ->
Thread.join id)
!l
;;

The same with Marshal.from_channel:

let s = String.make 10000 'x' in
let out = open_out "r.data" in
Marshal.to_channel out s [];
close_out out;
let l = ref [] in
for i = 1 to 50 do
let id =
Thread.create
(fun () ->
for j = 1 to 100 do
let ch = open_in "r.data" in
ignore(Marshal.from_channel ch);
close_in ch;
done;
)
()
in
l := id :: !l
done;

List.iter
(fun id ->
Thread.join id)
!l
;;

vicuna · 2000-01-17T12:37:06Z

Comment author: administrator

Marshalling, including the input/output_value functions in Pervasives, is not
reentrant at all. This very clear from the C source code in byterun/extern.c
and intern.c:
[global variables]

Please keep in mind that the two thread libraries for Caml are set up in such a
way
that only one thread at a time is in Caml code or in most parts of the runtime
system. For native threads, this is achieved by having a master lock that
any thread must acquire before running Caml code, and releases only when it
is about to perform a blocking I/O operation. While the main motivation for
the master lock is that the allocator and GC aren't concurrent, the master lock
also protects a bunch of global variables such as those in extern.c and
intern.c.

Marshalling is a two-step process: first, the stream of bytes is built in
memory,
without releasing the master lock; then, it is written on the channel,
releasing
the master lock for the duration of the (potentially blocking) write. But the
global variables are used only during the first step (I think -- I'll have to
check the code again!).

This piece of code often segfaults with Marshal.to_channel:

I'll look at that. One quick question: is this with bytecode threads or native
threads?

Best regards,

Xavier Leroy

vicuna · 2000-01-17T19:38:19Z

Comment author: administrator

On Mon, 17 Jan 2000, you wrote:

Marshalling, including the input/output_value functions in Pervasives, is not
reentrant at all. This very clear from the C source code in byterun/extern.c
and intern.c:
[global variables]

Please keep in mind that the two thread libraries for Caml are set up in such a
way
that only one thread at a time is in Caml code or in most parts of the runtime
system. For native threads, this is achieved by having a master lock that
any thread must acquire before running Caml code, and releases only when it
is about to perform a blocking I/O operation. While the main motivation for
the master lock is that the allocator and GC aren't concurrent, the master lock
also protects a bunch of global variables such as those in extern.c and
intern.c.

I already took the master lock into account. The buffers that are used for
the "read" and "write" system calls are not protected (enough). This is very
clear from "output_val":

void output_val(struct channel *chan, value v, value flags)
{
long len;
if (! channel_binary_mode(chan))
failwith("output_value: not a binary channel");
alloc_extern_block();
len = extern_value(v, flags);
really_putblock(chan, extern_block, len);
stat_free(extern_block);
}

Although "extern_block" is copied into the channel's private buffer, it is still
possible that several threads use "extern_block" in parallel. Why that? Consider
thread A and thread B want to "output_val" almost at the same time. Now this
can happen:

thread A runs until the "write" system call is reached (somewhere in
really_putblock). The master lock is released at this time.
"write" blocks for a moment
thread B can now continue, do "alloc_extern_block" again although the previous
buffer is still in use
thread B does not block; it frees extern_block
thread A wakes up and tries to free extern_block again. Crash.

A possible solution would be to free extern_block immediately after it has
been copied to the channel buffer, and before the master lock will be released.

The same problem applies to "input_val", because "intern_input" can be used by
several threads in parallel, too. Here, the solution is very simple: do not
use the global variable "intern_input" while "really_getblock" is in progress
but a local pointer. Assign this local pointer to "intern_input" after the input
operation is over.

There is another problem with input_val:

value input_val(struct channel *chan)
{
...
intern_input = (unsigned char *) stat_alloc(block_len);
...
if (really_getblock(chan, (char *)intern_input, block_len) == 0) {
...
}
...
}

really_getblock simply does getblock multiple times with the same buffer.

int getblock(struct channel *channel, char p, long int len)
{
...
if (n <= avail) { ...
} else if (avail > 0) { ...
} else if (n< IO_BUFFER_SIZE) { ...
} else {
nread = do_read(channel->fd, p, n); / WRONG */
channel->offset += nread;
return nread;
}
}

"do_read" then passes "p" over to "read", with released master lock.

This may also be the reason for the Pervasives.input bug I reported, too,
because "caml_input" passes a Caml-allocated string to "getblock". This
string can be corrupted by a garbage collection at the same time.

I'll look at that. One quick question: is this with bytecode threads or native
threads?

Native threads, of course.

Gerd

Gerd Stolpmann Telefon: +49 6151 997705 (privat)
Viktoriastr. 100
64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany

vicuna · 2000-02-07T13:41:54Z

Comment author: administrator

I already took the master lock into account. The buffers that are used for
the "read" and "write" system calls are not protected (enough). This is very
clear from "output_val":
Although "extern_block" is copied into the channel's private buffer, it is
still
possible that several threads use "extern_block" in parallel.

You are correct, "stat_free(extern_block)" in output_val isn't protected by
the master lock.

A possible solution would be to free extern_block immediately after it has
been copied to the channel buffer, and before the master lock will be
released.

Unfortunately, the channel buffer doesn't always have enough free room to copy
extern_block in it at once.

My fix is to save the value of extern_block (the address of the extern buffer)
in a local variable before calling really_putblock, then pass that local
variable
to stat_free at the end.

The same problem applies to "input_val", because "intern_input" can be used
by
several threads in parallel, too. Here, the solution is very simple: do not
use the global variable "intern_input" while "really_getblock" is in progress
but a local pointer. Assign this local pointer to "intern_input" after the
input
operation is over.

Right. (My fix above is symmetrical). This will be in 3.00.

[In getblock:]
"do_read" then passes "p" over to "read", with released master lock.
This may also be the reason for the Pervasives.input bug I reported, too,
because "caml_input" passes a Caml-allocated string to "getblock". This
string can be corrupted by a garbage collection at the same time.

Correct. There was a similar bug in Unix.read and Unix.write that was fixed a
while ago, but I didn't realize the same problem occurred in buffered I/O as
well.

I have fixed this by suppressing the optimization in getblock (that we read
directly into the Caml string, bypassing entirely the buffer, if the buffer is
empty
and the request large enough). In other terms, all data read goes first
through
the buffer (which doesn't move during GC), then is copied back to the Caml
string
while the master lock is held. It's slightly less efficient (all reads are now
decomposed into 4K chunks) but I don't see any other way to remain GC-safe.

Thanks for the bug reports and diagnostics,

Xavier Leroy

I'll look at that. One quick question: is this with bytecode threads or
native
threads?

Native threads, of course.

Gerd

Gerd Stolpmann Telefon: +49 6151 997705 (privat)
Viktoriastr. 100
64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany

vicuna · 2000-02-07T13:46:49Z

Comment author: administrator

Fixed in 3.00

vicuna closed this as completed Feb 7, 2000

vicuna added the bug label Mar 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

threads and marshalling with I/O #2359

threads and marshalling with I/O #2359

vicuna commented Jan 14, 2000

vicuna commented Jan 17, 2000

vicuna commented Jan 17, 2000

vicuna commented Feb 7, 2000

Gerd

Gerd Stolpmann Telefon: +49 6151 997705 (privat)
Viktoriastr. 100
64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany

vicuna commented Feb 7, 2000

threads and marshalling with I/O #2359

threads and marshalling with I/O #2359

Comments

vicuna commented Jan 14, 2000

Bug description

vicuna commented Jan 17, 2000

vicuna commented Jan 17, 2000

Gerd

Gerd Stolpmann Telefon: +49 6151 997705 (privat) Viktoriastr. 100 64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat) Germany

vicuna commented Feb 7, 2000

Gerd

Gerd Stolpmann Telefon: +49 6151 997705 (privat) Viktoriastr. 100 64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat) Germany

vicuna commented Feb 7, 2000

Gerd Stolpmann Telefon: +49 6151 997705 (privat)
Viktoriastr. 100
64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany

Gerd Stolpmann Telefon: +49 6151 997705 (privat)
Viktoriastr. 100
64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany