Version française
Home     About     Download     Resources     Contact us    
Browse thread
Parsing
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Bruno De Fraine <Bruno.De.Fraine@v...>
Subject: Re: [Caml-list] Parsing
On 22 Jun 2007, at 01:14, Jon Harrop wrote:

> However, you must factor heads in the pattern matches. So instead of:
>
>   and parse_expr = parser
>     | [< e1=parse_factor; 'Kwd "+"; e2=parse_expr >] -> e1 +: e2
>     | [< e1=parse_factor; 'Kwd "-"; e2=parse_expr >] -> e1 -: e2
>     | [< e1=parse_factor  >] -> e1
>
> you must write:
>
>   and parse_expr = parser
>     | [< e1=parse_factor; stream >] ->
>         (parser
>          | [< 'Kwd "+"; e2=parse_expr >] -> e1 +: e2
>          | [< 'Kwd "-"; e2=parse_expr >] -> e1 -: e2
>          | [< >] -> e1) stream

Actually you must write something like:

and parse_expr = parser
   | [< e1=parse_factor; stream >] ->
       let rec aux e1 = parser
         | [< 'Kwd "+"; e2=parse_factor; stream >] -> aux (e1 +: e2)  
stream
         | [< 'Kwd "-"; e2=parse_factor; stream >] -> aux (e1 -: e2)  
stream
         | [< >] -> e1
       in aux e1 stream

Although you get the precedences right by organizing different levels  
for expr/factor/atom, you've made the operations right associative by  
greedily parsing "e2" as an expression. "+" and "-" should be left  
associative, unless you want 1-2+3 to evaluate to -4.

> Why is this? Is it because the streams are destructive? Could the  
> camlp4 macro
> not factor the pattern match for me?

camlp4's extensible grammar system can handle levels and  
associativities for you. Read
   http://caml.inria.fr/pub/docs/tutorial-camlp4/tutorial003.html
for more information.

This is your example in the new camlp4:

$ ocaml camlp4of.cma
         Objective Caml version 3.10.0

         Camlp4 Parsing version 3.10.0

# type expr = Num of int | Add of expr * expr | Sub of expr * expr |  
Mult of expr * expr ;;
type expr =
     Num of int
   | Add of expr * expr
   | Sub of expr * expr
   | Mult of expr * expr
# let (+:) e1 e2 = Add(e1,e2) ;;
val ( +: ) : expr -> expr -> expr = <fun>
# let (-:) e1 e2 = Sub(e1,e2) ;;
val ( -: ) : expr -> expr -> expr = <fun>
# let ( *: ) e1 e2 = Mult(e1,e2) ;;
val ( *: ) : expr -> expr -> expr = <fun>
# open Camlp4.PreCast ;;
# let expr = Gram.Entry.mk "expr" ;;
val expr : '_a Camlp4.PreCast.Gram.Entry.t = <abstr>
# EXTEND Gram
     expr:
       [ "add" LEFTA
         [ e1 = expr; "+"; e2 = expr -> e1 +: e2
         | e1 = expr; "-"; e2 = expr -> e1 -: e2 ]
       | "mult" LEFTA
         [ e1 = expr; "*"; e2 = expr -> e1 *: e2 ]
       | "simple" NONA
         [ n = INT -> Num(int_of_string n)
         | "("; e = expr; ")" -> e ] ]
       ;
   END ;;
- : unit = ()
# Gram.parse expr Loc.ghost (Stream.of_string "1-2+3*4") ;;
- : expr = Add (Sub (Num 1, Num 2), Mult (Num 3, Num 4))

Another reason to use camlp4 Grammars is that you get automatic  
syntax error descriptions. However, this seems to contain bugs in the  
current camlp4:

# Gram.parse expr Loc.ghost (Stream.of_string "1+2*(3+)-4") ;;
- : expr = Sub (Add (Num 1, Mult (Num 2, Num 3)), Num 4)

(This should not be accepted. It used to give: Stream.Error "[expr]  
expected after '+' (in [expr])")

# Gram.parse expr Loc.ghost (Stream.of_string "1+2*(3+!)-4") ;;
Exception:
Loc.Exc_located (<abstr>,
Stream.Error "[expr] expected after \"(\" (in [expr])").

(This is wrong, because there is a legal expression after "(", namely  
"3".)

Regards,
Bruno


--
Bruno De Fraine
Vrije Universiteit Brussel
Faculty of Applied Sciences, DINF - SSEL
Room 4K208, Pleinlaan 2, B-1050 Brussels
tel: +32 (0)2 629 29 75
fax: +32 (0)2 629 28 70
e-mail: Bruno.De.Fraine@vub.ac.be