Version française
Home     About     Download     Resources     Contact us    
Browse thread
mixing lexers with camlp4
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Pietro Abate <Pietro.Abate@a...>
Subject: mixing lexers with camlp4
Hi all,
I want to parsa a language like this one:
l := l & l | l % l | Id

where the symbols & , % , ... are almost arbitrary.
This my first step toward the idea of expanding the camlp4 language on
the fly.  So for the moment I'm parsing the language, then I'll add the
actions to extend the grammar. For the moment I'm happy to return a list
of type stype.

I've written the following camlp4 extension:

type stype = Lid | Symbol of string ;;
let (=~) s re = Str.string_match (Str.regexp re) s 0;;
let tok = ["[a-z][A-Z]*[a-z]*";"[A-Z][A-Z]*[a-z]*";
           "%";"&";"*";"?";"~";"[";"]";"<";">"] ;;
let symbex s = List.exists (fun e -> s =~ e) tok ;;

let grammar = Grammar.gcreate (Plexer.gmake ());;
let symbol strm =
    match Stream.peek strm with
    | Some(_,s) when (symbex s) -> Stream.junk strm; s
    | _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser grammar "symbol" symbol ;;
let gram_list = Grammar.Entry.create grammar "gram_list";;

EXTEND
GLOBAL: gram_list;

gram_list: [[ grams = LIST1 gram; EOI -> grams ]];

gram: [[ p = LIDENT; ":="; rules = LIST1 rule SEP "|" -> (p,rules) ]];

rule: [[ psl = LIST1 psymbol -> psl ]];

psymbol: [[
     "Id" -> Lid
    | e = symbol -> Symbol(e)
]];
END
;;

now my problem is with the production symbol, that I'd like to parse not using
the standard camlp4 lexer, but one of my own. This is because I want to allow
almost arbitrary symbols in my language and the Plexer is to restrictive. My
solution above works but it's very clumsy. The easiest way I can think of is
to use the Genlex module. So to have something like:

let lexer = Genlex.make_lexer [
    "+";"-";"*";"/";"=";
    "[";"]";"<";">";
    "%";"&";"*";"?";"~"
];;

let symbgrammar = Grammar.gcreate (lexer);;
let symbol strm =
    |Kwd s -> Stream.junk strm; s
    |Ident i -> ....
    ......... 
    | _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser symbgrammar "symbol" symbol ;;

of course the Genlex module is not immediately compatible with the Plexer
interface so I'm a bit lost...

- Is this the best way of doing it ?

- How can I make the Genlex module compatible with the Plexer 
  interface (example ?) ?

- Does camlp4 allows me to mix lexers for different productions in the same
  extension ?

I believe this kind of things are going to be much easier with the new
camlp4 version...

:)
p

-- 
++ Blog: http://blog.rsise.anu.edu.au/?q=pietro
++ 
++ "All great truths begin as blasphemies." -George Bernard Shaw
++ Please avoid sending me Word or PowerPoint attachments.
   See http://www.fsf.org/philosophy/no-word-attachments.html