Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] lexer disambiguation?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2001-09-10 (15:31)
From: Xavier Leroy <Xavier.Leroy@i...>
Subject: Re: [Caml-list] lexer disambiguation?
> since the lexer looks like an ordinary ocaml function (more or less), does
> the disambiguation boil down to:
> 1. the longest series of bytes that matches a single rule
> 2. match the first rule in the function that matches #1

I'm not sure which lexer you're talking about.

Lexers generated by ocamllex do indeed implement the behavior you
describe: longest match + first rule if several rules matches the same
maximal-length substring.  (But they sure don't look like ordinary
OCaml functions: they just call an underlying table-driven DFA engine
that does all the hard work!)

Lexers written using stream parsers behave like all stream parsers:
they select the first pattern that matches the beginning of the
stream, then "commit" to this pattern, matching the remainder of the
pattern without backtracking.  This "commit" behavior is different
from regular pattern-matching on (say) lists, which backtracks as

The OCaml lexer (used by the compilers and the toplevel), as well as
the generic lexer in module Genlex, also implement the longest-match
rule, so that for instance abcd is one identifier, not four
identifiers a, b, c, and d.

I hope this answers your question.

- Xavier Leroy
Bug reports:  FAQ:
To unsubscribe, mail  Archives: