Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
camlp4 and lexers
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Pietro Abate <Pietro.Abate@p...>
Subject: camlp4 and lexers
Hi all,
This question was asked a few weeks ago, and again last week.  However I
still don't really get how to proceed. I hope we can cook down a small
example to understand a bit more the camlp4 internals.

Say I want to write a small parser for regexp (or an aritmetic
calculator), but I don't want to extend the ocaml grammar to do that. I
just want to create a minimal lexer and a minimal grammar to parse
expressions like (aaa*|b?);c

The parser part is easy (below). The part I don't understand is how to
create a lexer. I had a look at the ocsigen xmlcaml lexer and the camlp4
lexer, but I still haven't found a minimal example I can use without
getting confused. 

In particular, the problem below is that I want my lexer to give me back
CHAR tokens (different from the CHAR of char * string of camlp4) and not
strings. I could do the same with the camlp4 lexer, but all my regexp
should be then written as ('a''a''a' *) etc ... that it's not good

A while ago I did something similar with the old camlp4 [1] using
plexer, but this is not possible anymore...

Nicolas a while ago suggested to copy the Camlp4.PreCast module and the 
lexer module and customize them. I think it should be possible just
to use Struct.Grammar.Static.Make with a new lexer instead... but, as I
said, I'm not able to write a very minimal lexer for this example...
Maybe I'm confused about this.

I think a minimal example will help more then one person here.

thanks :)

-------------------------- This is my parser...

module RegExGram =  Struct.Grammar.Static.Make(RegExpLexer)

let regex = "regex"

  GLOBAL: regex;

  regex: [[ e1 = SELF ; "|" ; e2 = concat -> Alt(e1,e2)
          | e1 = seq -> e1 ]

  concat:[[ e1 = SELF ; ";"; e2 = seq -> Seq(e1,e2)
          | e1 = SELF ; e2 = seq -> Seq(e1,e2)
          | e1 = seq -> e1 ]

  seq:   [[ e1 = simple ; "?" -> Opt e1
          | e1 = simple ; "*" -> Star e1
          | e1 = simple ; "+" -> Plus e1
          | e1 = simple -> e1 ]

  simple:[[ "." -> Dot
          | "("; e1 = regex; ")" -> e1
          | `CHAR(s) -> Sym s ]