module Genlex:A generic lexical analyzer.
This module implements a simple ``standard'' lexical analyzer, presented as a function from character streams to token streams. It implements roughly the lexical conventions of OCaml, but is parameterized by the set of keywords of your language.
Example: a lexer suitable for a desk calculator is obtained by
let lexer = make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"]
The associated parser would be a function from
to, for instance,
int, and would have rules such as:
let parse_expr = parser
[< 'Int n >] -> n
| [< 'Kwd "("; n = parse_expr; 'Kwd ")" >] -> n
| [< n1 = parse_expr; n2 = parse_remainder n1 >] -> n2
and parse_remainder n1 = parser
[< 'Kwd "+"; n2 = parse_expr >] -> n1+n2
One should notice that the use of the
parser keyword and associated
notation for streams are only available through camlp4 extensions. This
means that one has to preprocess its sources e. g. by using the
"-pp" command-line switch of the compilers.
Floatfor integer and floating-point numbers;
Stringfor string literals, enclosed in double quotes;
Charfor character literals, enclosed in single quotes;
Identfor identifiers (either sequences of letters, digits, underscores and quotes, or sequences of ``operator characters'' such as
*, etc); and
Kwdfor keywords (either identifiers or single ``special characters'' such as
val make_lexer :
string list -> char Stream.t -> token Stream.t
sis returned as
sbelongs to this list, and as
Ident sotherwise. A special character
sis returned as
sbelongs to this list, and cause a lexical error (exception
Parse_error) otherwise. Blanks and newlines are skipped. Comments delimited by
*)are skipped as well, and can be nested.