[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Xavier Leroy <Xavier.Leroy@i...> |
| Subject: | Re: ocamlyacc/ocamllex problems |
> I am writing a parser that uses Parsing.rhs_start and Parsing.rhs_end in
> a rule. The problem is the following,
>
> 1) If I use a simple rule in the lexer that matches a token all is fine.
> ex:
> | "'" ['0' '1' '*' '.']+ "'" { ... }
>
> 2) If I use an automata in the lexer for matching the same token, the
> results of Parsing.rhs_start and Parsing.rhs_end are wrong.
> ex:
> | "'" { ... bits lexbuf ... }
> and bits = parse
> | '\'' { ... }
> | ['0' '1' '.' '*' ] { ... }
> | eof { ... }
> | _ { ... }
>
> I am not sure to undertand the reasons of my problem?
For terminal symbols (tokens), the locations returned by
Parsing.rhs_start and Parsing.rhs_end are those returned by
Lexing.lexeme_start and Lexing.lexeme_end. However, these two
functions track the location of the *last* regular expression matched by
the ocamllex-generated automaton. (This location is stored and
updated in place in the "lexbuf" argument.)
So, if your lexing rule recursively calls other lexing rules (as in
case 2 above), the locations reported correspond to the part of the
token that was last matched by a regular expression (i.e. the last
"bit" of the token in your example 2).
To get correct locations in example 2, a bit of "lexbuf" hacking is
required to restore the start location to what it was when the first
regexp was matched:
| "'" { let start = Lexing.lexeme_start lexbuf in
let res = ... bits lexbuf ... in
lexbuf.Lexing.lex_start_pos <- start - lexbuf.Lexing.lex_abs_pos;
res }
and bits = parse ...
Hope this helps,
- Xavier Leroy