Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
camlp4 stream parser syntax
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2009-03-08 (13:33)
From: Matthieu Wipliez <mwipliez@y...>
Subject: Re : Re : Re : Re : [Caml-list] Re: camlp4 stream parser syntax

> > In this case, here is a possible solution, you have your hash table associate 
> a lowercase version of the token with what you'd like to use in the grammar:
> > "buytocover" => "BuyToCover"
> > "sellshort" => "SellShort"
> > ...
> I'm doing this already but I don't think it will do the trick with a camlp4 
> parser since it goes through is_kwd to find a match when you use "delay".

I've just tested the idea with my lexer, in the rule identifier:
  | identifier as ident {
    if String.lowercase ident = "action" then
      IDENT "ActioN"
      IDENT ident

replacing entries in the grammar that match against "action" so they match against "ActioN".

In the source code, I have
reload: ActIon in8:[i]
shift: acTIon

And Camlp4 parses it correctly. I have a tentative explanation as why it works below:

> I think that the internal keyword hash table in the grammar needs to be 
> populated with lowercase keywords (by invoking 'using'). I don't know how to get 
> to the 'using' function yet, though.

I don't think so, here is what happens:
  1) you preprocess your grammar with camlp4of. This transforms the EXTEND statements (and a lot of other stuff) to calls to Camlp4 modules/functions.
The grammar parser is in the Camlp4GrammarParser module.
In the rule "symbol", the entry | s = STRING -> matches strings (literal tokens) and produces a TXkwd s.
This is later transformed by make_expr to an expression Camlp4Grammar__.Skeyword s (quotation <:expr< $uid:gm$.Skeyword $str:kwd$ >>)
What this means is that at compile time an entry
  my_rule : [ [ "BuyOrSell"; .. ] ]
gets transformed to an AST node
  Skeyword "BuyOrSell"

You can see that by running "camlp4of" on the parser. Every rule gets transformed to a call to Gram.extend function, with Gram.Sopt, Gram.Snterm, Gram.Skeyword etc.

  2) At runtime, when you start your program, all the Gram.extend calls are executed (because they are top-level). Your parser is kind of configured.
It turns out that extend is just a synonym for Insert.extend
  (last line of Static module)

  value extend = Insert.extend

This function will insert rules and tokens into Camlp4. The insert_tokens function tells us that whenever a Skeyword is seen, "using gram kwd" is called.
I believe this is the function you're referring to?

This function calls Structure.using, which basically add a keyword if necessary, and increase its reference count. (I think this is to automatically remove unused keywords, remember that Camlp4 can also delete rules, not only insert them).

So to sum up: when you declare a rule with a token "MyToken", the grammar is configured to recognize a "MyToken" keyword.

Now the lexer produces IDENT (or SYMBOL for that matters). SYMBOLs are KEYWORDs by default. IDENTs become KEYWORDs if they match the keyword content.

So in our case, the lexer recognizes identifiers. If this identifier equals (case-insensitively speaking) "mytoken", we declare an IDENT "MyToken", which will be later recognized as the "MyToken" keyword (because the is_kwd test is case-sensitive).


>     Thanks, Joel
> ---
> Mac, C++, OCaml