Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006369OCamlOCaml tools (ocaml{lex,yacc,dep,browser,debug})public2014-04-12 08:452014-07-16 16:08
Assigned To 
PlatformOSOS Version
Product Version4.02.0+dev 
Target VersionFixed in Version 
Summary0006369: [github patch] shave the yacc: add yacc support to some of Menhir's feature to facilitate migration
Description [^]

Menhir vs Yacc

Menhir being superior to ocamlyacc in almost every point, it may be worthwhile at some point in the future to consider using it as the parser generator for building Ocaml.

Of course this will come with a bootstrap problem, which I have no idea how to address and as such I am not proposing to switch the parser now.

However a second problem comes from differences in the grammar syntax and in the relevant APIs.

Menhir introduces the ```$startpos```, ```$endpos```, ```$startpos($id|ident)```, etc. keywords to refer to position of grammatical items in the production being reduced.

In contrast, ocamlyacc relies on the user manually querying the ```Parsing``` module to fetch the positions. This comes with shared global state incompatible with menhir approach.

A first step

The current pull request extends ocamlyacc to support a few features of menhir to help a potential migration, or even just to allow sharing the grammar between users of the two parser generators.

## Sugar

### Explicit names bound to RHS values

| LPAREN e = expr RPAREN { e }

Is now accepted and the ```e``` name is bound to ```$2``` in the action.

### Remove "=" as valid character to enter an action

| LPAREN expr RPAREN = $2 }

Was valid ocamlyacc code. Who would want to use that?! And of course, this is incompatible with the previous feature.

### Allow ocaml-style comments in the grammar

Until now, Ocamlyacc only supported C-style ```/* ... */``` comments in the grammar. (Actions can of course embed ocaml-style comments).

Nested ```(* ... (* ... *) ... *)``` comments are now supported, with a limited support for strings inside comments (escaped character are just skipped, "tagged"-string literals introduced in Ocaml 4.02 lexer are not supported).

### Enable %start <type> TERM

The following code:
%start main
%type <Ast.t> main

Can now be written:
%start <Ast.t> main

## Bridging the gap

Most of menhir keywords can be used.

```$startpos```, ```$endpos```, ```$startpos($id|ident)```, ```$startofs($id|ident)```, ```$endpos($id|ident)```,```$endofs($id|ident)``` are bound to the equivalent call to ```Parsing.<…>```.

```$syntaxerror``` is equivalent to ```raise Parsing.Parse_error```.

```$previouserror``` fails at compile-time, because there is no way, AFAIK, to emulate this feature.


In case of failure, these features try as much as possible to print a relevant error or warning message to the user.

And finally, having a grammar at the intersection of this ocamlyacc and menhir will also greatly help merlin in supporting new versions of the grammar :).
Tagsgithub, patch
Attached Files

- Relationships

-  Notes
doligez (administrator)
2014-07-16 16:08

I find it a bit strange to modify ocamlyacc to support more features rather than trying to extend Menhir (probably under control of a command-line option) to support legacy grammars.

But why not.

- Issue History
Date Modified Username Field Change
2014-04-12 08:45 gasche New Issue
2014-04-12 08:46 gasche Tag Attached: github
2014-04-12 08:46 gasche Tag Attached: patch
2014-07-16 16:08 doligez Note Added: 0011849
2014-07-16 16:08 doligez Status new => acknowledged

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker