Browse thread
[ANNOUNCE] Alpha release of Menhir, an LR(1) parser generator for ocaml
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | skaller <skaller@u...> |
| Subject: | Re: [Caml-list] [ANNOUNCE] Alpha release of Menhir, an LR(1) parser generator for ocaml |
On Tue, 2005-12-13 at 13:07 -0800, Nathaniel Gray wrote: > This is pretty nice! Every time I use ocamlyacc I think "somebody > should write something better." Now it looks like somebody has! I > can't tell you how many times I've wanted parameterized rules and > simple "library" rules for parsing delimiter-separated lists and > such... Yes, it is pretty nice! However it still appears to have some problems. Any comments appreciated. 0. The licence. Q public licence for the generator???? Please NO NO NO!! Not unless it is distributed as part of the official distro. Is there any chance of that? If not even GPL would be better ;( 1. Generating a functor is cute, but it doesn't seem to allow arguments to parser functions. Perhaps I missed something? Is there a way to use the functorisation with closures to add an argument? In particular, can the parser be generated *inside* an environment such a function or let binding? [Felix allows that, which means an extra argument is not required, a variable in the environment can be used instead] 2. The signature of parsers is still wrong? Ocamlyacc usesthe typing val parser: (lexbuf->token) -> lexbuf -> 'a which is just bad. A better signature is val parser: ( unit -> token ) -> 'a There is no need to provide location information: the correct solution is to throw an exception, which is caught in a context which can determine the location. It would be nice to be able to generate this signature with a command line switch, pragma, or some other mechanism, even if the default is chosen for ocamlyacc compatibility. 3. I have doubts about the claim that parsers can 'share' token types. I do not see how this is possible. It is contradicted by the compilation model description, which explains how it is necessary to join separate files making up a grammar specification. In this case, the joined system is going to generate a single token type, and any type generated by another joining is certain to generate a distinct type because (a) the type is defined in a distinct ocaml module (mli file) (b) the typing of normal variants is nominal This problem would go away if polymorphic variants were used instead, because the typenames are then simply abbreviations, since pm-variants are structurally, not nominally, typed. Perhaps a command line switch, pragma, or whatever, to use polymorphic variants instead of ordinary ones? Actually, I personally find the 'yacc' technique of generating tokens to be rather lame. Felix does this much better -- the parser simply expects a token type which is a variant, the type can be defined wherever you like. In particular, the lexer and parser can share that definition. As far as I can see Menhir COULD do this, except of course one would use %token as a special way of generating the variant. All that would be required I think is the syntax %import_tokens "filename" which refers to the token definition file -- as an alternative to inlining these token definitions. (if pm-variants are used you could probably support both, though I'm not sure). A token definition file then generates two files, an ordinary mli file with the token variant type, and, a special information file for the parser generator (with the same information, but in a more useful form). In Felix none of this is necessary because parsing is built in, so the compiler can find the information required for the parser generator directly from the token variant type. 4. Just curious, but how practical is LR(1) in terms of generated code sizes? Felix is using Elkhound as its parser which is a GLR parser with an LALR(1) core. In theory there is an option for choosing the core automaton, which also allows LR(1) however I recall Scott McPeak commenting it wasn't worth supporting because it generated tables which were far too big. I'm curious how one would be able to predict the size of the generated code since I don't really understand the additional constraints LALR(1) introduces .. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net