new library modules

Xavier Leroy (xavier@Theory.Stanford.EDU)
Mon, 3 May 1993 11:45:43 -0700 (PDT)

From: Xavier Leroy <xavier@Theory.Stanford.EDU>
Message-Id: <9305031845.AA00973@Tamuz.Stanford.EDU>
Subject: new library modules
To: caml-list@margaux
Date: Mon, 3 May 1993 11:45:43 -0700 (PDT)

Hello, mailing list,

I'm considering to add the following modules to the standard library
in a future release of Caml Light:

genlex generic lexical analyzer
set applicative sets over totally ordered types
map applicative maps over totally ordered types

The interfaces are given at the end of this message. I'm interested in
any comments you might have on these modules, such as missing
operations, poor names, or vague documentation.

Also, it's the right time to call for extensions or corrections in the
library (``Gee, I really miss a frobnicate_list function''), even
though what can go in the standard library is severely constrainted
both for ideological reasons (I believe no combinator should be part
of the standard library, including function composition) and practical
reasons (we're about to hit various size limitations in the PC 8086
version).

- Xavier Leroy

---------------------------- Module genlex ----------------------------

(* A generic lexical analyzer *)

(* This module implements a simple ``standard'' lexical analyzer, presented
as a function from character streams to token streams. It implements
roughly the lexical conventions of Caml, but is parameterized by the
set of keywords of your language. *)

#open "stream";;

type token =
Kwd of string
| Ident of string
| Int of int
| Float of float
| String of string
| Char of char;;
(* The type of tokens. The lexical classes are: [Int] and [Float]
for integer and floating-point numbers; [String] for
string literals, enclosed in double quotes; [Char] for
character literals, enclosed in backquotes; [Ident] for
identifiers (either sequences of letters, digits, underscores
and quotes, or sequences of ``operator characters'' such as
[+], [*], etc); and [Kwd] for keywords (either identifiers or
single ``special characters'' such as [(], [}], etc). *)

value make_lexer: string list -> (char stream -> token stream);;
(* Construct the lexer function. The first argument is the list of
keywords. An identifier [s] is returned as [Kwd s] if [s]
belongs to this list, and as [Ident s] otherwise.
A special character [s] is returned as [Kwd s] if [s]
belongs to this list, and cause a lexical error (exception
[Parse_error]) otherwise. Blanks and newlines are skipped.
Comments delimited by [(*] and [*)] are skipped as well,
and can be nested.

Example: a lexer suitable for a desk calculator is obtained by
- [ let lexer = make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"]
]
The associated parser would be a function from [token stream]
to, for instance, [int], and would have rules such as:
[ let parse_expr = function
[< 'Int n >] -> n
| [< parse_expr n1; (parse_end n1) n2 >] -> n2
| [< 'Kwd "("; parse_expr n; 'Kwd ")" >] -> n
and parse_end n1 = function
[< 'Kwd "+"; parse_expr n2 >] -> n1+n2
| ...
] *)

---------------------------- Module set ----------------------------

(* Sets over ordered types *)

(* This module implements the set data structure, given a total ordering
function over the set elements. All operations over sets
are purely applicative (no side-effects).
The implementation uses balanced binary trees, and is therefore
reasonably efficient: membership takes time logarithmic in the size
of the set, for instance. *)

type 'a t;;
(* The type of sets containing elements of type ['a]. *)

type 'a ordering == 'a -> 'a -> int;;
(* Most operations over sets take as argument a
total ordering function over the set elements.
This is a two-argument function [f] such that
[f e1 e2] is zero if the elements [e1] and [e2] are equal,
[f e1 e2] is strictly negative if [e1] is smaller than [e2],
and [f e1 e2] is strictly positive if [e1] is greater than [e2].

For instance, a suitable ordering function for type [int]
is [fun i j -> i-j]. For type [string], you could use
[compare_string].

The user is responsible for always using the same ordering
function to operate on a given set. (This is what we mean
by ``the set [s] is ordered according to the ordering [ord]''
in the following.) Otherwise, the functions
below could produce incorrect results. *)

value empty: 'a t
(* The empty set. *)
and is_empty: 'a t -> bool
(* Test whether a set is empty or not. *)
and mem: 'a ordering -> 'a -> 'a t -> bool
(* [mem ord x s] tests whether [x] is equal to an element
of [s], in the sense that [ord x y = 0] for some element [y]
of [s]. *)
and add: 'a ordering -> 'a -> 'a t -> 'a t
(* [add ord x s] returns a set containing all elements of [s],
plus [x]. If [x] was already in [s], [s] is returned unchanged.
[s] and the returned set are ordered according to [ord]. *)
and remove: 'a ordering -> 'a -> 'a t -> 'a t
(* [add ord x s] returns a set containing all elements of [s],
except [x]. If [x] was not in [s], [s] is returned unchanged.
[s] and the returned set are ordered according to [ord]. *)
and union: 'a ordering -> 'a t -> 'a t -> 'a t
and inter: 'a ordering -> 'a t -> 'a t -> 'a t
and diff: 'a ordering -> 'a t -> 'a t -> 'a t
(* Union, intersection and set difference. The two set arguments
and the returned set are ordered according to
the first argument. *)
and equal: 'a ordering -> 'a t -> 'a t -> bool
(* [equal ord s1 s2] tests whether the sets [s1] and [s2] are
equal, that is, contain equal elements according to [ord]. *)
and compare: 'a ordering -> 'a t -> 'a t -> int
(* Total ordering between sets. Can be used as the ordering function
for doing sets of sets. *)
and elements: 'a t -> 'a list
(* Return the list of all elements of the given set.
The elements appear in the list in some non-specified order. *)
and iter: ('a -> 'b) -> 'a t -> unit
(* [iter f s] applies [f] in turn to all elements of [s], and
discards the results. The elements of [s] are presented to [f]
in a non-specified order. *)
and fold: ('a -> 'b -> 'b) -> 'a t -> 'b -> 'b
(* [fold f s a] computes [(f xN ... (f x2 (f x1 a))...)],
where [x1 ... xN] are the elements of [s].
The order in which elements of [s] are presented to [f] is
not specified. *)
;;

---------------------------- Module map ----------------------------

(* Association tables over ordered types *)

(* This module implements applicative association tables, also known as
finite maps, given a total ordering function over the keys.
All operations over maps are purely applicative (no side-effects).
The implementation uses balanced binary trees, and therefore searching
and insertion take time logarithmic in the size of the map. *)

type ('a, 'b) t;;
(* The type of maps from type ['a] to type ['b]. *)

type 'a ordering == 'a -> 'a -> int;;
(* Most operations over maps take as argument a
total ordering function over the keys of the map.
This is a two-argument function [f] such that
[f k1 k2] is zero if the keys [k1] and [k2] are equal,
[f k1 k2] is strictly negative if [k1] is smaller than [k2],
and [f k1 k2] is strictly positive if [k1] is greater than [k2].

For instance, a suitable ordering function for type [int]
is [fun i j -> i-j]. For type [string], you could use
[compare_string].

The user is responsible for always using the same ordering
function to operate on a given map. (This is what we mean
by ``the map [m] is ordered according to the ordering [ord]''
in the following.) Otherwise, the functions
below could produce incorrect results. *)

value empty: ('a, 'b) t
(* The empty map. *)
and add: 'a ordering -> ('a, 'b) t -> 'a -> 'b -> ('a, 'b) t
(* [add ord m x y] returns a map containing the same bindings as
[m], plus a binding of [x] to [y]. Previous bindings for [x]
in [m] are not removed, but simply hidden: they reappear
after performing [remove ord map x].
(This is the semantics of association lists.)
[m] and the returned map are ordered according to [ord]. *)
and find: 'a ordering -> ('a, 'b) t -> 'a -> 'b
(* [find ord m x] returns the current binding of [x] in [m],
or raises [Not_found] if no such binding exists.
[m] is ordered according to [ord]. *)
and remove: 'a ordering -> ('a, 'b) t -> 'a -> ('a, 'b) t
(* [remove ord m x] returns a map containing the same bindings
as [m], minus the current binding for [x]. The previous
binding for [x] is restored if it exists. [m] is returned
unchanged if [x] is not bound in [m].
[m] and the returned map are ordered according to [ord]. *)
and iter: ('a -> 'b -> 'c) -> ('a, 'b) t -> unit
(* [iter f m] applies [f] to all bindings in map [m],
discarding all the results.
[f] receives the key as first argument, and the associated value
as second argument. The order in which the bindings are passed to
[f] is unspecified.
A given binding will be presented to [f] only once. *)
and fold: ('a -> 'b -> 'c -> 'c) -> ('a, 'b) t -> 'c -> 'c
(* [fold f m a] computes [(f kN dN ... (f k2 d2 (f k1 d1 a))...)],
where [(k1,d1) ... (kN, dN)] are the bindings of map [m].
The order in which bindings are presented to [f] is not
specified. *)
;;