Previous Contents Next

File Descriptors

In chapter 3 we have seen functions from the standard module Pervasives. These functions allow us to access files via input / output channels. There is also a lower-level way to access files, using their descriptors.

A file descriptor is an abstract value of type Unix.file_descr, containing information necessary to use a file: a pointer to the file, the access rights, the access modes (read or write), the current position in the file, etc.

Three descriptors are predefined. They correspond to standard input, standard output, and standard error.

# ( Unix.stdin , Unix.stdout , Unix.stderr ) ;;
- : Unix.file_descr * Unix.file_descr * Unix.file_descr =
<abstr>, <abstr>, <abstr>

Be careful not to confuse them with the corresponding input / output channels:

# ( Pervasives.stdin , Pervasives.stdout , Pervasives.stderr ) ;;
- : in_channel * out_channel * out_channel = <abstr>, <abstr>, <abstr>

The conversion functions between channels and file descriptors are described at page ??.

File Access Rights.
Under Unix each file has an associated owner and group. The rights to read, write and execute are attached to each file according to three categories of users: the owner of a file, the members of the file's group1 and all other users.

The access rights of a file are represented by 9 bits divided into three groups of three bits each. The first group represents the rights of the owner, the second the rights of the members of the owner's group, and the last the rights of all other users. In each group of three bits, the first bit represents the right to read, the second bit the right to write and the third bit the right to execute. It is common to abbreviate these three rights by the letters r, w and x. The absence of the rights is represented in each case by a dash (-). For exampple, the right to read for all and the right to write only for the owner is written as rw-r--r--. This corresponds to the integer 420 (which is the binary number 0b110100100). Frequently the more comfortable octal notation 0o644 is used. These file access rights are not used under Windows.


File Manipulation

Opening a file.
Opening a file associates the file to a file descriptor. Depending on the intended use of the file there are several modes to open a file. Each mode corresponds to a value of type open_flag described by figure 18.2.

O_RDONLY read only
O_WRONLY write only
O_RDWR reading and writing
O_NONBLOCK non-blocking opening
O_APPEND appending at the end of the file
O_CREAT create a new file if it does not exist
O_TRUNC set the file to 0 if it exists
O_EXCL chancel, if the file already exists

Figure 18.2: Values of type open_flag.

These modes can be combined. In consequence, the function openfile takes as argument a list of values of type open_flag.

# Unix.openfile ;;
- : string -> Unix.open_flag list -> Unix.file_perm -> Unix.file_descr =
The first argument is the name of the file. The last is an integer2 coding the rights to attach to the file in the case of creation.

Here is an example of how to open a file for reading, or to create it with the rights rw-r--r-- if it does not exist:

# let file = Unix.openfile "test.dat" [Unix.O_RDWR; Unix.O_CREAT] 0o644 ;;
val file : Unix.file_descr = <abstr>

Closing a file.
The function Unix.close closes a file. It is applied to the descriptor of the file to close.

# Unix.close ;;
- : Unix.file_descr -> unit = <fun>
# Unix.close file ;;
- : unit = ()

Redirecting file descriptors.
It is possible to attach several file descriptors to one input / output. If there is only one file descriptor available and another one is desired we can use:

# Unix.dup ;;
- : Unix.file_descr -> Unix.file_descr = <fun>

If we have two file descriptors and we want to assign to the second the input / output of the first, we can use the function:

# Unix.dup2 ;;
- : Unix.file_descr -> Unix.file_descr -> unit = <fun>

For example, the error output can be directed to a file in the following way:

# let error_output = Unix.openfile "err.log" [Unix.O_WRONLY;Unix.O_CREAT] 0o644 ;;
val error_output : Unix.file_descr = <abstr>
# Unix.dup2 Unix.stderr error_output ;;
- : unit = ()
Data written to the standard error output will now be directed to the file err.log.

Input / Output on Files

The functions to read and to write to a file and Unix.write use a character string as medium between the file and the Objective CAML program.

# ;;
- : Unix.file_descr -> string -> int -> int -> int = <fun>
# Unix.write ;;
- : Unix.file_descr -> string -> int -> int -> int = <fun>

In addition to the file descriptor and the string the functions take two integers as arguments. One is the index of the first character and the other the number of characters to read or to write. The returned integer is the number of characters effectively read or written.

# let mode = [Unix.O_WRONLY;Unix.O_CREAT;Unix.O_TRUNC] in
let fl = Unix.openfile "file" mode 0o644 in
let str = "012345678901234565789" in
let n = Unix.write fl str 4 5
in Printf.printf "We wrote %s to the file\n" (String.sub str 4 n) ;
Unix.close fl ;;
We wrote 45678 to the file
- : unit = ()

Reading a file works the same way:

# let fl = Unix.openfile "file" [Unix.O_RDONLY] 0o644 in
let str = String.make 20 '.' in
let n = fl str 2 10 in
Printf.printf "We read %d characters" n;
Printf.printf " and got the string %s\n" str;
Unix.close fl ;;
We read 5 characters and got the string ..45678.............
- : unit = ()

Access to a file always takes place at the current position of its descriptor. The current position can be modified by the function:

# Unix.lseek ;;
- : Unix.file_descr -> int -> Unix.seek_command -> int = <fun>

The first argument is the file descriptor. The second specifies the displacement as number of characters. The third argument is of type Unix.seek_command and indicates the origin of the displacement. The third argument may take one of three posssible values: A function call with an erronous position will either raise an exception or return a value equal to 0.

Input / output channels.
The Unix module provides conversion functions between file descriptors and the input / output channels of module Pervasives:

# Unix.in_channel_of_descr ;;
- : Unix.file_descr -> in_channel = <fun>
# Unix.out_channel_of_descr ;;
- : Unix.file_descr -> out_channel = <fun>
# Unix.descr_of_in_channel ;;
- : in_channel -> Unix.file_descr = <fun>
# Unix.descr_of_out_channel ;;
- : out_channel -> Unix.file_descr = <fun>

It is necessary to indicate whether the input / output channels obtained by the conversion transfer binary data or character data.

# set_binary_mode_in ;;
- : in_channel -> bool -> unit = <fun>
# set_binary_mode_out ;;
- : out_channel -> bool -> unit = <fun>

In the following example we create a file by using the functions of module Unix. We read using the opening function of module Unix and the higher-level input function input_line.

# let mode = [Unix.O_WRONLY;Unix.O_CREAT;Unix.O_TRUNC] in
let f = Unix.openfile "file" mode 0o666 in
let s = "0123456789\n0123456789\n" in
let n = Unix.write f s 0 (String.length s)
in Unix.close f ;;
- : unit = ()
# let f = Unix.openfile "file" [Unix.O_RDONLY;Unix.O_NONBLOCK] 0 in
let c = Unix.in_channel_of_descr f in
let s = input_line c
in print_string s ;
close_in c ;;
0123456789- : unit = ()

A program may have to work with multiple inputs and outputs. Data may not always be available on a given channel, and the program cannot afford to wait for one channel to be available while ignoring the others. The following function lets you determine which of a given list of inputs/outputs is available for use at a given time:

# ;;
- : Unix.file_descr list ->
Unix.file_descr list ->
Unix.file_descr list ->
float ->
Unix.file_descr list * Unix.file_descr list * Unix.file_descr list
= <fun>
The first three arguments represent lists of respectively inputs, of outputs and error-outputs. The last argument indicates a delay in seconds. A negative value means the null delay. The results are the lists of available input, output and error-output.


select is not implemented under Windows

Previous Contents Next