Streams, parsers, and printers

Caml Light comprises a built-in type for streams (possibly infinite sequences of elements, that are evaluated on demand), and associated stream expressions, to build streams, and stream patterns, to destructure streams. Streams and stream patterns provide a natural approach to the writing of recursive-descent parsers.

Streams are presented by the following extensions to the syntactic classes of expressions:

   |  [< >]
   |  [< stream-component {; stream-component} >]
   |  function stream-matching
   |  match expr with stream-matching

      ' expr
   |  expr

      stream-pattern -> expr {| stream-pattern -> expr}

      [< >]
   |  [< stream-comp-pat {; stream-comp-pat} >]

      ' pattern
   |  expr pattern
   |  ident

Stream expressions are bracketed by [< and >]. They represent the concatenation of their components. The component ' expr represents the one-element stream whose element is the value of expr. The component expr represents a sub-stream. For instance, if both s and t are streams of integers, then [<'1; s; t; '2>] is a stream of integers containing the element 1, then the elements of s, then those of t, and finally 2. The empty stream is denoted by [< >].

Unlike any other kind of expressions in the language, stream expressions are submitted to lazy evaluation: the components are not evaluated when the stream is built, but only when they are accessed during stream matching. The components are evaluated once, the first time they are accessed; the following accesses reuse the value computed the first time.

Stream patterns, also bracketed by [< and >], describe initial segments of streams. In particular, the stream pattern [< >] matches all streams. Stream pattern components are matched against the corresponding elements of a stream. The component ' pattern matches the corresponding stream element against the pattern. The component expr pattern applies the function denoted by expr to the current stream, then matches the result of the function against pattern. Finally, the component ident simply binds the identifier to the stream being matched. (The current implementation limits ident to appear last in a stream pattern.)

Stream matching proceeds destructively: once a component has been matched, it is discarded from the stream (by in-place modification).

Stream matching proceeds in two steps: first, a pattern is selected by matching the stream against the first components of the stream patterns; then, the following components of the selected pattern are checked against the stream. If the following components do not match, the exception Parse_error is raised. There is no backtracking here: stream matching commits to the pattern selected according to the first element. If none of the first components of the stream patterns match, the exception Parse_failure is raised. The Parse_failure exception causes the next alternative to be tried, if it occurs during the matching of the first element of a stream, before matching has committed to one pattern.

See Functional programming using Caml Light for a more gentle introductions to streams, and for some examples of their use in writing parsers. A more formal presentation of streams, and a discussion of alternate semantics, can be found in Parsers in ML by Michel Mauny and Daniel de Rauglaudre, in the proceedings of the 1992 ACM conference on Lisp and Functional Programming.