Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0001909OCamlOCaml otherlibspublic2003-10-31 06:162012-06-21 20:16
Reporteradministrator 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityalways
StatusacknowledgedResolutionopen 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0001909: Unix.read isn't POSIX-confirmant even if the OS is
DescriptionPut this text into /tmp/foo.ml:

   let len: int = 32000;;
   let buf: string = String.create len;;
   let infd: Unix.file_descr = Unix.openfile "/usr/bin/tcsh" [Unix.O_RDONLY] 0;;
   let bytes: int = Unix.read infd buf 0 len;;
   let _ = Format.eprintf "bytes is %d, len is %d.\n@?" bytes len;;

(Here /usr/bin/tcsh can be replaced by any convenient readable file
that has at least 32000 bytes.)

Give these commands:

   cd /tmp
   /usr/bin/ocamlc unix.cma foo.ml -o foo
   ./foo

It prints:

   bytes is 16384, len is 32000.

I think this is wrong. bytes should have been 32000. This is because
the documentation at http://caml.inria.fr/ocaml/htmlman/manual035.html [^] says:

   Refer to sections 2 and 3 of the Unix manual for more details on
   the behavior of these functions.

and "man 2 read" for Linux and some other operating systems says that
the read system call is POSIX conformant. To see the POSIX spec,
register at

   http://www.unix-systems.org/version3/online.html [^]

and visit

   http://www.opengroup.org/onlinepubs/007904975/functions/read.html [^]

It says:

   Upon successful completion, where nbyte is greater than 0, read()
   shall mark for update the st_atime field of the file, and shall return
   the number of bytes read. This number shall never be greater than
   nbyte. The value returned may be less than nbyte if the number of
   bytes left in the file is less than nbyte, if the read() request was
   interrupted by a signal, or if the file is a pipe or FIFO or special
   file and has fewer than nbyte bytes immediately available for
   reading. For example, a read() from a file associated with a terminal
   may return one typed line of data.

The common English meaning (but not the meaning used by mathematicians
and programmers and perhaps lawyers) for "if" in

   The value returned may be less than nbyte if ...blah blah...

is if-and-only-if. If they really meant a mathematical "if" there,
then that whole sentence is meaningless because it would be saying
that if these conditions hold, the value returned may be less than
nbyte, and if these conditions don't hold, then we aren't saying
anything so there is no constraint and the value returned may still be
less than nbyte. They must have meant "only if".

I submitted a bug to POSIX to change that "if" to "only if".

If you run foo under strace, you can see that the length argument
passed to the C "read" system call is 16384. This is a consequence of
the code in otherlibs/unix/read.c:

  Begin_root (buf);
    numbytes = Long_val(len);
    if (numbytes > UNIX_BUFFER_SIZE) numbytes = UNIX_BUFFER_SIZE;
    enter_blocking_section();
    ret = read(Int_val(fd), iobuf, (int) numbytes);
    leave_blocking_section();
    if (ret == -1) uerror("read", Nothing);
    memmove (&Byte(buf, Long_val(ofs)), iobuf, ret);
  End_roots();

The length limit is a consequence of reading the bytes into a buffer
before copying them, and I don't see any point in doing that. You
could do this just as well:

  Begin_root (buf);
    numbytes = Long_val(len);
    enter_blocking_section();
    ret = read(Int_val(fd), &Byte(buf, Long_val(ofs)), (int) numbytes);
    leave_blocking_section();
    if (ret == -1) uerror("read", Nothing);
  End_roots();

(If you copy this code, remember to remove the declaration of iobuf.)

I'm using the Debian unstable ocaml 3.07-7.
--
Tim Freeman tim@fungible.com
GPG public key fingerprint ECDF 46F8 3B80 BB9E 575D 7180 76DF FE00 34B1 5C78
Computers don't like it when you anthropomorphize them. -- Chris Phoenix

TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0000189)
administrator (administrator)
2003-10-31 10:18

>That doesn't work. A blocking section is not allowed to access the
>Caml heap in any way (at least in multi-threaded programs).

Right. Now that you mention this, I found a discussion of it at:

   http://pauillac.inria.fr/~aschmitt/cwn/2003.04.22.html#3 [^]

Thus the remaining options are:

1. malloc an arbitrary-sized buffer (which might be bad if it's huge)
2. document that, unlike POSIX, Unix.read might not read all the
   requested bytes from a plain file even when the bytes are available.
3. if the number of bytes to read exceeds the fixed-size buffer,
   before doing the C read, use select to wait for some data to become
   available on the file descriptor. Once data is available, read
   should return promptly, so the read could be done after the
   leave_blocking_section. This will only cause an extra I/O
   operation when we're reading big chunks. If the chunk is big
   enough we might save a bunch of I/O operations by reading it all at
   once, so this might improve average performance.

--
Tim Freeman tim@fungible.com
GPG public key fingerprint ECDF 46F8 3B80 BB9E 575D 7180 76DF FE00 34B1 5C78
Computers don't like it when you anthropomorphize them. -- Chris Phoenix

(0000190)
administrator (administrator)
2003-10-31 16:10

> The length limit is a consequence of reading the bytes into a buffer
> before copying them, and I don't see any point in doing that. You
> could do this just as well:
>
> Begin_root (buf);
> numbytes = Long_val(len);
> enter_blocking_section();
> ret = read(Int_val(fd), &Byte(buf, Long_val(ofs)), (int) numbytes);
> leave_blocking_section();
> if (ret == -1) uerror("read", Nothing);
> End_roots();

That doesn't work. A blocking section is not allowed to access the
Caml heap in any way (at least in multi-threaded programs).

-- Damien

(0006829)
doligez (administrator)
2012-01-27 15:47

select() doesn't guarantee that data is available on the file descriptor when read() is called, because the data might disappear (i.e. be removed by some other process) between the call to select() and the call to read(). In that case, we are left with a thread that blocks outside of a blocking section, a major problem for multithreaded programs.

Our only realistic option is to document this behaviour.
(0006835)
gerd (reporter)
2012-01-27 21:04

The POSIX text is not very exact. Traditonally, read() does not return fewer bytes than requested for regular files (independently of how you read the POSIX specs - all POSIX-based OS do this). There are other cases which are also not exactly specified (e.g. behaviour for devices or message-based channels), and here the Ocaml behaviour hurts more. E.g. you cannot receive a 64K Internet datagram - it is cut off at 16K.

In Ocamlnet the chosen workaround is to use bigarrays as primary buffers, i.e. you have something like

val mem_read : Unix.file_descr -> buffer -> int -> int -> int

where type buffer = (char, int8_unsigned_el, c_layout) Array1.t

The user has now the option to make the buffer as large as necessary. This solution has the nice property that you can even save one data copy if you can directly process the data in the bigarray buffer. If you copy the bigarray to a string (btw, this function is missing in the runtime), you have exactly the same overhead as the current solution for Unix.read. As bigarrays are malloc-ed memory, we do not run into the problem that the buffer can be moved around by the GC when triggered by another thread.

So, my suggestion is:
 - make something like mem_read the fundamental operation, and expose it
   for users needing exact control
 - Unix.read would remain the same, and the issue is documented
 - include functions for copying char bigarray to string and vice versa

This might imply some restructuring of the libraries, especially, the C part of Bigarrays would have to be moved to the normal stdlib.

Btw, write, recv, send have similar problems.

- Issue History
Date Modified Username Field Change
2005-11-18 10:13 administrator New Issue
2012-01-27 15:47 doligez Note Added: 0006829
2012-01-27 21:04 gerd Note Added: 0006835
2012-06-21 20:16 frisch Category OCaml general => OCaml otherlibs


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker