Browse thread
Re: [Caml-list] Unix.lseek versus Pervasives.pos
-
cashin@c...
- Nicolas George
- Ken Rose
- Basile STARYNKEVITCH
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Basile STARYNKEVITCH <basile@s...> |
| Subject: | Re: [Caml-list] Unix.lseek versus Pervasives.pos |
>>>>> "cashin" == cashin <cashin@cs.uga.edu> writes:
cashin> Sorry if this shows up as a duplicate. Basile
cashin> STARYNKEVITCH <basile@starynkevitch.net> writes:
Basile>> You apparently forgot to flush the channel.
Ok, I made a stupid mistake (flushing is only for writes!) but my
intuition was right, in the sense of taking buffering into account.
cashin> Flushes are for writes, but even when using a test program
cashin> that just reads, zero is returned when it appears that it
cashin> shouldn't return zero. Compare the short ocaml program
cashin> below to the comparable C version.
Ok; but the problem is the same: Ocaml I/O subsystem manage internal
buffering. Channels are not Unix filedescriptors, but a buffering of
these. See the source code (in particular ocaml/byterun/io.c and io.h) for
details. In particular, a channel is (from io.h) implemented as
struct channel {
int fd; /* Unix file descriptor */
file_offset offset; /* Absolute position of fd in the file */
char * end; /* Physical end of the buffer */
char * curr; /* Current position in the buffer */
char * max; /* Logical end of the buffer (for input) */
void * mutex; /* Placeholder for mutex (for systhreads) */
struct channel * next; /* Linear chaining of channels (flush_all) */
int revealed; /* For Cash only */
int old_revealed; /* For Cash only */
int refcount; /* For flush_all and for Cash */
char buff[IO_BUFFER_SIZE]; /* The buffer itself */
};
where IO_BUFFER_SIZE is usually 4096 bytes.
The equivalent C library would mix lseek with <stdio.h> FILE, and also
get a mess:
/* file main.c */
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
int main(void)
{
FILE *f = fopen("main.c", "r");
char buf[1024];
int fd = fileno(f);
memset(buf, '\0', sizeof(buf));
fread(buf, 1, 10, f);
printf("after reading \"%s\" lseek returns %d\n",
buf, (int) lseek(fd, 0, SEEK_CUR));
return 0;
}
When I run above file with tcc (www.tinycc.org) I get
after reading " /* file " lseek returns 483
which is messy as I was expecting.
In a short sentence, never mix Unix.read (or other Unix IO) &
Pervasive.* channel operations.
As usual with advices, it is a "don't do what I did" advice; shame on
me :-( I must admit that I once did open a channel and then only do
Unix.read operations on it, but I commented this code (opensource code
in Poesia monitor) with
(** IMPORTANT NOTICE: here outputxchannel_t-s are only used for their
Unix file descriptor; no output takes actually place on the output
channel; all output is thru Unix.write *)
and later
(** the reply channel from filter to monitor [don't use the
Pervasives.channel; using Unix] *)
The bad reasons for mixing channels & unix file descriptors (beside
perhaps a design bug) is that I use nonblocking unix IO and that I
want precise control over the actual read & write system calls -so I
don't want extra buffering-
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners