Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] ocamllex+ocamlyacc and not parsing properly
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] ocamllex+ocamlyacc and not parsing properly
On Mon, 2005-08-08 at 16:23 +1200, Jonathan Roewen wrote:
> Is there any way to call another rule based on some variable in
> ocamllex? I see you can pass arguments to a rule, but what use are
> these except in the actions part?

You are trying to do too much work in the wrong
places IMHO.

Your lexer should always be context insensitive
and build small sensible pretokens NOT tokens:

identifier
integer
whitespace
newline
: # .

Then postprocess these pretokens into tokens,
this is easiest with a list where you can use
pattern matching and functional techniques to
look ahead.

Then parse the tokens, this is easy because you just
choose the tokens to make it easy :)

Use the grammar production arguments $1 $2 ...
to do further processing in the action as required.

The point is: the easy stuff is done by the two
automata (lexer, parser) and the hard stuff
is done in OCAML code.

For example: Felix lexer generates these pretokens:

WHITESPACE NEWLINE COMMENT

which are NOT tokens of the grammar. These tokens
are stripped out by the preprocessor. You may actually
do this:

let pack_names tokens =
match tokens with
| NAME s1 :: WHITE s2:: NAME s3 :: t -> 
  NAME (String.concat [s1;s2;s3]) :: pack_names t
| WHITE :: t -> pack_names t
| h :: t -> h :: pack_names t
| [] -> []

Yeah, this isn't idea because it isn't tail recursive,
but it should illustrate the idea: do the hard stuff
in a language capable of handling the hard stuff easily .. :)

Tail rec version:

let pack output input = match input with
| NAME .... :: t -> 
  pack (NAME (Str.....) :: output) t
 ...
[] -> List.rev output

This technique can often be used to condition a
nasty language into an LALR(1) .. I managed to turn
Python into an LALR(1) languages .. but it took
17 preprocessing passes to do it .. :))
Mainly, fiddling with INDENT, UNDENT since Python
is based on indentation rules, but a lot of hassles
with the expression x,y, which is extremely hard
to parse (optional trailing comma ..)


-- 
John Skaller <skaller at users dot sourceforge dot net>