File Descriptors
In chapter 3 we have seen functions from the standard
module Pervasives. These functions allow us to access files via input /
output channels. There is also a lower-level way to access files,
using their descriptors.
A file descriptor is an abstract value of type Unix.file_descr,
containing information necessary to use a file: a pointer to the file,
the access rights, the access modes (read or write), the current position
in the file, etc.
Three descriptors are predefined. They correspond to standard input,
standard output, and standard error.
# (
Unix.stdin
,
Unix.stdout
,
Unix.stderr
)
;;
- : Unix.file_descr * Unix.file_descr * Unix.file_descr =
<abstr>, <abstr>, <abstr>
Be careful not to confuse them with the corresponding input / output
channels:
# (
Pervasives.stdin
,
Pervasives.stdout
,
Pervasives.stderr
)
;;
- : in_channel * out_channel * out_channel = <abstr>, <abstr>, <abstr>
The conversion functions between channels and file descriptors
are described at page ??.
File Access Rights.
Under Unix each file has an associated owner and group.
The rights to read, write and execute are attached
to each file according to three categories of users:
the owner of a file, the members of the file's group1 and all other users.
The access rights of a file are represented by 9 bits divided
into three groups of three bits each. The first group represents the
rights of the owner, the second the rights of the members of the
owner's group, and the last the rights of all other users.
In each group of three bits, the first bit represents the right to
read, the second bit the right to write and the third bit the right
to execute.
It is common to abbreviate these three rights by the letters
r, w and x. The absence of the rights is
represented in each case by a dash (-).
For exampple, the right to read for all and the right to write
only for the owner is written as rw-r--r--
. This corresponds
to the integer 420 (which is the binary number
0b110100100). Frequently the more comfortable octal notation
0o644 is used. These file access rights are not used
under Windows.
REVIEWER'S QUESTION: IS THIS STILL TRUE UNDER WIN2K?
File Manipulation
Opening a file.
Opening a file associates the file to a file descriptor.
Depending on the intended use of the file there are
several modes to open a file. Each mode corresponds to
a value of type open_flag described by figure
18.2.
O_RDONLY |
read only |
O_WRONLY |
write only |
O_RDWR |
reading and writing |
O_NONBLOCK |
non-blocking opening |
O_APPEND |
appending at the end of the file |
O_CREAT |
create a new file if it does not exist |
O_TRUNC |
set the file to 0 if it exists |
O_EXCL |
chancel, if the file already exists |
Figure 18.2: Values of type open_flag.
These modes can be combined. In consequence, the function
openfile takes as argument a list of values of type
open_flag.
# Unix.openfile
;;
- : string -> Unix.open_flag list -> Unix.file_perm -> Unix.file_descr =
<fun>
The first argument is the name of the file. The last is an
integer2 coding the rights to attach to the file
in the case of creation.
Here is an example of how to open a file for reading, or to create it with
the rights rw-r--r--
if it does not exist:
# let
file
=
Unix.openfile
"test.dat"
[
Unix.
O_RDWR;
Unix.
O_CREAT]
0
o644
;;
val file : Unix.file_descr = <abstr>
Closing a file.
The function Unix.close closes a file. It is applied
to the descriptor of the file to close.
# Unix.close
;;
- : Unix.file_descr -> unit = <fun>
# Unix.close
file
;;
- : unit = ()
Redirecting file descriptors.
It is possible to attach several file descriptors to one input / output.
If there is only one file descriptor available and another one
is desired we can use:
# Unix.dup
;;
- : Unix.file_descr -> Unix.file_descr = <fun>
If we have two file descriptors and we want to assign to the second the
input / output of the first, we can use the function:
# Unix.dup2
;;
- : Unix.file_descr -> Unix.file_descr -> unit = <fun>
For example, the error output can be directed to a file
in the following way:
# let
error_output
=
Unix.openfile
"err.log"
[
Unix.
O_WRONLY;Unix.
O_CREAT]
0
o644
;;
val error_output : Unix.file_descr = <abstr>
# Unix.dup2
Unix.stderr
error_output
;;
- : unit = ()
Data written to the standard error output will now be directed to the file
err.log.
Input / Output on Files
The functions to read and to write to a file
Unix.read and Unix.write use a character string
as medium between the file and the Objective CAML program.
# Unix.read
;;
- : Unix.file_descr -> string -> int -> int -> int = <fun>
# Unix.write
;;
- : Unix.file_descr -> string -> int -> int -> int = <fun>
In addition to the file descriptor and the string the functions take
two integers as arguments. One is the index of the first character
and the other the number of characters to read or to write.
The returned integer is the number of characters effectively
read or written.
# let
mode
=
[
Unix.
O_WRONLY;Unix.
O_CREAT;Unix.
O_TRUNC]
in
let
fl
=
Unix.openfile
"file"
mode
0
o644
in
let
str
=
"012345678901234565789"
in
let
n
=
Unix.write
fl
str
4
5
in
Printf.printf
"We wrote %s to the file\n"
(String.sub
str
4
n)
;
Unix.close
fl
;;
We wrote 45678 to the file
- : unit = ()
Reading a file works the same way:
# let
fl
=
Unix.openfile
"file"
[
Unix.
O_RDONLY]
0
o644
in
let
str
=
String.make
2
0
'.'
in
let
n
=
Unix.read
fl
str
2
1
0
in
Printf.printf
"We read %d characters"
n;
Printf.printf
" and got the string %s\n"
str;
Unix.close
fl
;;
We read 5 characters and got the string ..45678.............
- : unit = ()
Access to a file always takes place at the current position
of its descriptor. The current position can be modified by the
function:
# Unix.lseek
;;
- : Unix.file_descr -> int -> Unix.seek_command -> int = <fun>
The first argument is the file descriptor. The second specifies the
displacement as number of characters. The third argument is of
type Unix.seek_command and indicates the origin of the
displacement. The third argument may take one of three posssible
values:
-
SEEK_SET: relative to the beginning of the file,
- SEEK_CUR: relative to the current position,
- SEEK_END: relative to the end of the file.
A function call with an erronous position will either raise
an exception or return a value equal to 0.
Input / output channels.
The Unix module provides conversion functions between
file descriptors and the input / output channels of module
Pervasives:
# Unix.in_channel_of_descr
;;
- : Unix.file_descr -> in_channel = <fun>
# Unix.out_channel_of_descr
;;
- : Unix.file_descr -> out_channel = <fun>
# Unix.descr_of_in_channel
;;
- : in_channel -> Unix.file_descr = <fun>
# Unix.descr_of_out_channel
;;
- : out_channel -> Unix.file_descr = <fun>
It is necessary to indicate whether the input / output channels
obtained by the conversion transfer binary data or character data.
# set_binary_mode_in
;;
- : in_channel -> bool -> unit = <fun>
# set_binary_mode_out
;;
- : out_channel -> bool -> unit = <fun>
In the following example we create a file by using the functions
of module Unix. We read using the opening function of
module Unix and the higher-level input function
input_line.
# let
mode
=
[
Unix.
O_WRONLY;Unix.
O_CREAT;Unix.
O_TRUNC]
in
let
f
=
Unix.openfile
"file"
mode
0
o666
in
let
s
=
"0123456789\n0123456789\n"
in
let
n
=
Unix.write
f
s
0
(String.length
s)
in
Unix.close
f
;;
- : unit = ()
# let
f
=
Unix.openfile
"file"
[
Unix.
O_RDONLY;Unix.
O_NONBLOCK]
0
in
let
c
=
Unix.in_channel_of_descr
f
in
let
s
=
input_line
c
in
print_string
s
;
close_in
c
;;
0123456789- : unit = ()
Availability.
A program may have to work with multiple inputs and outputs.
Data may not always be available on a given channel, and the program
cannot afford to wait for one channel to be available while ignoring
the others. The following function lets you determine which of a given list
of inputs/outputs is available for use at a given time:
# Unix.select
;;
- : Unix.file_descr list ->
Unix.file_descr list ->
Unix.file_descr list ->
float ->
Unix.file_descr list * Unix.file_descr list * Unix.file_descr list
= <fun>
The first three arguments represent lists of respectively
inputs, of outputs and error-outputs. The last argument
indicates a delay in seconds. A negative value means
the null delay. The results are the lists of available
input, output and error-output.
Warning
select is not implemented under Windows