Version française
Home     About     Download     Resources     Contact us    
Browse thread
Persistent twice-delimited continuations and CGI programming
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: oleg@o...
Subject: Persistent twice-delimited continuations and CGI programming

The new version of the delimcc library of first-class delimited
continuations for byte-code OCaml now supports serializing and storing
captured continuations. The stored continuation can be invoked in the
same or a different process running the same executable. The
persistence feature supports process checkpointing, migration -- and
specifically CGI programming. We can write CGI scripts using the
*unmodified* OCaml system as if they were interactive ordinary
_console_ applications. We can then execute these scripts on *any*,
*unmodified* web server (e.g., Apache). The serialized continuations
are twice delimited -- both in `control' and in `data'; the latter
makes them compact and possible.

The delimcc library is a pure library and makes _no_ changes to the
OCaml system -- neither to the compiler nor to the bytecode
VM. Therefore the library is perfectly compatible with any OCaml
program and any (compiled) OCaml library. The delimcc library has no
effect on the code that does not capture delimited continuations. The
library has been tested on OCaml systems 3.09.x and 3.10.2 on Linux
and FreeBSD.


One may think that making captured continuations persistent is
trivial: after all, OCaml already supports marshaling values
including closures. If one actually tries to marshal a captured
delimited continuation, one quickly discovers that the naive
marshaling fails with the error on attempting to serialize an
abstract data type. One may even discover that the troublesome
abstract data type is _chan. The captured delimited continuation (a
piece of stack along with administrative data) refers to heap data
structures created by delimcc and other OCaml libraries; some of these
data structures are closures, which contain module's environment and
may refer to standard OCaml functions like prerr. This function is a
closure over stderr, which is non-serializable. This points out the
first problem: if we serialize all the data reachable from the
captured continuation, we may end up serializing the (large) part of
the heap and the global environment. Aside from inefficiency, that may
preclude serialization as we are liable to encounter channels and
other non-serializable data structures.

There is a more serious problem however. If we serialize all data
reachable from the captured delimited continuation, we also serialize
two pieces of global state used by the delimcc library itself. When
the stored continuation is deserialized, a fresh copy of these global
data is created, referenced from within the restored
continuation. Thus the whole program will have two copies of delimcc
global data: one for use in the main program and one for use by the
deserialized continuation. Although such an isolation may be desirable
in many cases, it is precisely wrong in our case: the captured and the
host continuations do not have the common view of the system and
cannot work together. It may be instructive to contemplate process
checkpointing offered by some operating systems (see also `undump'
typically used by Emacs and TeX). When checkpointing a process, we
wish to save the continuation of the process only (rather than the
continuation of the scheduler that created the process, and the rest of
the OS continuation). We also wish to save data associated with the
process, for example, the process control block and the description of
allocated memory and other resources. Control blocks of all processes
are typically linked in; when saving the control block of one process,
we definitely do not wish to save everything that is reachable from
it.  When saving the state of a process in a checkpoint, we do not
usually save the state of the file system -- or even of all files used
by the process. First of all, that is impractical. Mainly, it is
sometimes wrong. For example, a process might write records to a log
file, e.g., syslog. We specifically do not wish to save the contents
of the syslog along with the process image. We want the restored
process append to the system log rather than replace it!

Of course restoring a suspended process after modifying its input
files may also be wrong. It is a hard question of what should be saved
by value and what should be saved by reference only. It is clear
however that both mechanisms are needed. The serialization code of the
delimcc library does offer both mechanisms. The inspiration comes
from the fact that OCaml's own serialization function serializes OCaml
code pointed to from within closures by reference. The delimcc library
extends this approach to data. The library supports registration of
data (which currently must be closures in the old heap) in a global
array. When serializing the continuation, the library traverses it and
replaces all references to registered closures with indices in the
global array; we then invoke OCaml's own serialization routine to
marshal the result. After that, we undo the replacement of closures
with indices. Such value mangling is not without precedent: to detect
sharing, OCaml's own marshaling routine too mangles the value being
serialized. The use of the global array is akin to the implementation
of cross-staged persistence in MetaOCaml.

The new version of the delimcc library is available at
	http://okmij.org/ftp/packages/caml-shift.tar.gz

The salient application is letting us write CGI scripts as if they
were interactive console applications using read and printf. We write
the scripts in the natural question-answer, storytelling style, with
the full use of lexical scope, exceptions, mutable data and other
imperative features (if necessary). The scripts can even be compiled
and run as interactive console applications. With a different
implementation of basic primitives for reading and writing, the
console programs become CGI scripts. All that has been 
demonstrated at the Continuation Fest 2008 two weeks ago:
	http://okmij.org/ftp/Computation/Fest2008-talk.pdf 
	http://okmij.org/ftp/Computation/Fest2008-talk-notes.pdf
(the second file has notes of what I might say during the
presentation). The corresponding code is available at
	http://okmij.org/ftp/ML/caml-web.tar.gz

The code contains the minimal library for writing CGI applications with
form validation. The code supports nested transactions. The captured
continuations are relatively compact: the essentially empty captured
continuation takes 491 bytes when serialized. The serialized
continuations of the unoptimized blog application have the typical
size of 10K or so (depending on the size of the posts); bzip can
compress them to one third of the original size.

The new version of the delimcc library implements abort as a
primitive, which makes it essentially as efficient as raise. The
library offers shift and control for convenience, and a useful
debugging primitive show_val, to display the structure of any OCaml
value.