Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Re: [Caml-list] Unix.lseek versus Pervasives.pos
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2003-03-19 (19:08)
From: Basile STARYNKEVITCH <basile@s...>
Subject: Re: [Caml-list] Unix.lseek versus Pervasives.pos
>>>>> "cashin" == cashin  <> writes:

    cashin> Sorry if this shows up as a duplicate.  Basile
    cashin> STARYNKEVITCH <> writes:

    Basile>> You apparently forgot to flush the channel.

Ok, I made a stupid mistake (flushing is only for writes!) but my
intuition was right, in the sense of taking buffering into account.

    cashin> Flushes are for writes, but even when using a test program
    cashin> that just reads, zero is returned when it appears that it
    cashin> shouldn't return zero.  Compare the short ocaml program
    cashin> below to the comparable C version.

Ok; but the problem is the same: Ocaml I/O subsystem manage internal
buffering. Channels are not Unix filedescriptors, but a buffering of
these. See the source code (in particular ocaml/byterun/io.c and io.h) for
details. In particular, a channel is (from io.h) implemented as

  struct channel {
    int fd;                       /* Unix file descriptor */
    file_offset offset;           /* Absolute position of fd in the file */
    char * end;                   /* Physical end of the buffer */
    char * curr;                  /* Current position in the buffer */
    char * max;                   /* Logical end of the buffer (for input) */
    void * mutex;                 /* Placeholder for mutex (for systhreads) */
    struct channel * next;        /* Linear chaining of channels (flush_all) */
    int revealed;                 /* For Cash only */
    int old_revealed;             /* For Cash only */
    int refcount;                 /* For flush_all and for Cash */
    char buff[IO_BUFFER_SIZE];    /* The buffer itself */

where IO_BUFFER_SIZE is usually 4096 bytes.

The equivalent C library would mix lseek with <stdio.h> FILE, and also
get a mess:

  /* file main.c */
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  #include <unistd.h>
  #include <stdio.h>
  int main(void)
      FILE *f = fopen("main.c", "r");
      char buf[1024];
      int fd = fileno(f);
      memset(buf, '\0', sizeof(buf));
      fread(buf, 1, 10, f);
      printf("after reading \"%s\" lseek returns %d\n",
             buf, (int) lseek(fd, 0, SEEK_CUR));
      return 0;

When I run above file with tcc ( I get 

after reading "  /* file " lseek returns 483 

which is messy as I was expecting.

In a short sentence, never mix (or other Unix IO) &
Pervasive.* channel operations. 

As usual with advices, it is a "don't do what I did" advice; shame on
me :-( I must admit that I once did open a channel and then only do operations on it, but I commented this code (opensource code
in Poesia monitor) with

(** IMPORTANT NOTICE: here outputxchannel_t-s are only used for their
   Unix file descriptor; no output takes actually place on the output
   channel; all output is thru Unix.write *) 

and later

(** the reply channel from filter to monitor [don't use the; using Unix] *)

The bad reasons for mixing channels & unix file descriptors (beside
perhaps a design bug) is that I use nonblocking unix IO and that I
want precise control over the actual read & write system calls -so I
don't want extra buffering-


email: basile<at>starynkevitch<dot>net 
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: