Version française
Home     About     Download     Resources     Contact us    
Browse thread
ocamlyacc/ocamllex problems
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <Xavier.Leroy@i...>
Subject: Re: ocamlyacc/ocamllex problems
> I am writing a parser that uses Parsing.rhs_start and Parsing.rhs_end in
> a rule.  The problem is the following,
> 
> 1) If I use a simple rule in the lexer that matches a token all is fine.
> ex:
>     |   "'" ['0' '1' '*' '.']+ "'"  { ... }
> 
> 2) If I use an automata in the lexer for matching the same token, the
> results of Parsing.rhs_start and Parsing.rhs_end are wrong.
> ex:	
>     | "'"  { ... bits lexbuf ... }
> and bits = parse
>     | '\'' { ... }
>     | ['0' '1' '.' '*' ] { ... }
>     | eof  { ... }
>     | _    { ... }
>
> I am not sure to undertand the reasons of my problem?

For terminal symbols (tokens), the locations returned by
Parsing.rhs_start and Parsing.rhs_end are those returned by
Lexing.lexeme_start and Lexing.lexeme_end.  However, these two
functions track the location of the *last* regular expression matched by
the ocamllex-generated automaton.  (This location is stored and
updated in place in the "lexbuf" argument.)

So, if your lexing rule recursively calls other lexing rules (as in
case 2 above), the locations reported correspond to the part of the
token that was last matched by a regular expression (i.e. the last
"bit" of the token in your example 2).

To get correct locations in example 2, a bit of "lexbuf" hacking is
required to restore the start location to what it was when the first
regexp was matched:

| "'"  { let start = Lexing.lexeme_start lexbuf in
         let res = ... bits lexbuf ... in
         lexbuf.Lexing.lex_start_pos <- start - lexbuf.Lexing.lex_abs_pos;
         res }
and bits = parse ...

Hope this helps,

- Xavier Leroy