Version française
Home     About     Download     Resources     Contact us    
Browse thread
Re: [Caml-list] Unix.lseek versus Pervasives.pos
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Basile STARYNKEVITCH <basile@s...>
Subject: Re: [Caml-list] Unix.lseek versus Pervasives.pos
>>>>> "cashin" == cashin  <cashin@cs.uga.edu> writes:

    cashin> Sorry if this shows up as a duplicate.  Basile
    cashin> STARYNKEVITCH <basile@starynkevitch.net> writes:

    Basile>> You apparently forgot to flush the channel.

Ok, I made a stupid mistake (flushing is only for writes!) but my
intuition was right, in the sense of taking buffering into account.

    cashin> Flushes are for writes, but even when using a test program
    cashin> that just reads, zero is returned when it appears that it
    cashin> shouldn't return zero.  Compare the short ocaml program
    cashin> below to the comparable C version.


Ok; but the problem is the same: Ocaml I/O subsystem manage internal
buffering. Channels are not Unix filedescriptors, but a buffering of
these. See the source code (in particular ocaml/byterun/io.c and io.h) for
details. In particular, a channel is (from io.h) implemented as

  struct channel {
    int fd;                       /* Unix file descriptor */
    file_offset offset;           /* Absolute position of fd in the file */
    char * end;                   /* Physical end of the buffer */
    char * curr;                  /* Current position in the buffer */
    char * max;                   /* Logical end of the buffer (for input) */
    void * mutex;                 /* Placeholder for mutex (for systhreads) */
    struct channel * next;        /* Linear chaining of channels (flush_all) */
    int revealed;                 /* For Cash only */
    int old_revealed;             /* For Cash only */
    int refcount;                 /* For flush_all and for Cash */
    char buff[IO_BUFFER_SIZE];    /* The buffer itself */
  };
 

where IO_BUFFER_SIZE is usually 4096 bytes.

The equivalent C library would mix lseek with <stdio.h> FILE, and also
get a mess:

  /* file main.c */
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  #include <unistd.h>
  #include <stdio.h>
  
  int main(void)
  {
      FILE *f = fopen("main.c", "r");
      char buf[1024];
      int fd = fileno(f);
  
      memset(buf, '\0', sizeof(buf));
      fread(buf, 1, 10, f);
      printf("after reading \"%s\" lseek returns %d\n",
             buf, (int) lseek(fd, 0, SEEK_CUR));
  
      return 0;
  }

When I run above file with tcc (www.tinycc.org) I get 

after reading "  /* file " lseek returns 483 

which is messy as I was expecting.

In a short sentence, never mix Unix.read (or other Unix IO) &
Pervasive.* channel operations. 


As usual with advices, it is a "don't do what I did" advice; shame on
me :-( I must admit that I once did open a channel and then only do
Unix.read operations on it, but I commented this code (opensource code
in Poesia monitor) with

(** IMPORTANT NOTICE: here outputxchannel_t-s are only used for their
   Unix file descriptor; no output takes actually place on the output
   channel; all output is thru Unix.write *) 

and later

(** the reply channel from filter to monitor [don't use the
   Pervasives.channel; using Unix] *)

The bad reasons for mixing channels & unix file descriptors (beside
perhaps a design bug) is that I use nonblocking unix IO and that I
want precise control over the actual read & write system calls -so I
don't want extra buffering-

-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners