Re: The lexer hack
Date: 2009-11-11 (11:04)
From: Martin Jambon <martin.jambon@e...>
Subject: Re: [Caml-list] Re: The lexer hack
Dario Teixeira wrote:
> Hi,
>> Interesting. Have you confirmed that this works? I am slightly
>> worried by the fact that an LR parser reads one token ahead,
>> i.e. one token past BEGIN_VERB might already have been read
>> before the enter_verb semantic action is executed. If that is
>> so, then this token would be read while the lexer is still in
>> the wrong mode.
> Yes, I was just thinking about that as well... :-)
> I think I can pile another hack on top of the dummy action:
> dummy tokens to take care of the readahead issue.  Though
> this has the potential to get comically silly pretty quickly!
> I'll report later...

If the lexer to use can be determined by only one token (BEGIN_VERB), I think
you can change the state in the lexer like this:

rule token state = parse
 ""   { match !state with
             `Normal -> normal_token state lexbuf
           | `Verbatim -> verbatim_token state lexbuf

and normal_token state = parse
| "\\begin{verbatim}"   { state := `Verbatim; BEGIN_VERB }

and verbatim_token state = parse
  ...                  { RAW (...) }
| "\\end{verbatim}"    { state := `Normal; END_VERB }

An even simpler option, if possible in your case, is to use a single token for
the whole verbatim section:

rule token = parse
| "\\begin{verbatim}"   { finish_verbatim lexbuf }

and finish_verbatim = shortest
  _* as s "\\end{verbatim}"   { RAW s }