Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-06-19 (01:48)
From: skaller <skaller@u...>
Subject: camlp5?
Hi, I thought I'd briefly report progress on a subsystem of Felix
which may prove a useful technology for Ocaml.

I am using Dypgen

and Ocs scheme

together for building Felix AST terms in a flexible way.

Dypgen is an *extensible* GLR parser generator which is now working
reasonably well. Syntactic forms in the input language can be parsed
so that the user actions extend the grammar being parsed, including
adding new non-terminals and handling conflicts. GLR parsers can
parse any context free language, and work with many grammars for
such languages, and although some care is required obtaining reasonable
performance, an Ocamlyacc-like (LALR1) grammar seems to parse about
the same speed as Ocamlyacc.

The big problem with extensions is not the grammar, but the user
actions. To solve this problem, I have chosen to use Scheme programs
as user actions. Ocs_scheme is written in Ocaml and is very easy
to work with, and appears quite fast.

With this combination, user action codes are written as 'text'
which is then compiled and executed to produce an Ocs_types.sval
or scheme value. These values are basically s-expressions.

Felix converts them to a simple s-expression format, then
converts them to Felix AST_format with a laboriously coded
mapping. Of course the typing is dynamic, up to the building
of the AST. The AST built is the same one the same Ocamlyacc
grammar would have yielded.

At this point I am moving 95% of the Felix grammar into
a user library, so that the 'initial' hard coded grammar
is 'more or less' reduced to:

(a) the top level compilation unit (start symbol)
(b) some code for parsing syntax extension syntax
(c) an effective top level dummy non-terminal
(d) a set of bottom non-terminals which wrap tokens with attributes,
   such as literals and names

The bits 'in between' are all programmed in the input source,
that is, user written library code.

The same idea could be used for Ocaml. Instead of using ocamlp4
to separately build extensions, the extensions are defined
directly in the source code that needs them, and ocaml
parses it. Note: dypgen syntax extensions are properly scoped.

Felix uses the format:

syntax schemer {
  s_term := sprefixed =># "_1";
  s_term := s_term / spower =># "`(ast_apply (div (,_1 ,_3)))";
  s_term := s_term % spower =># "`(ast_apply (mod (,_1 ,_3)))";

open syntax schemer;

so that the grammar definitions are defined separately from
use. At the moment Felix #includes this input into the program
that needs it, however the layout is design to support separate
compilation of the grammar definitions, so that only the 
'open syntax' is required. That statement provides the linkage
to the language sorely missing from the current ocamlp4 engineering.

However the above system is much more flexible because it is 
properly scoped -- the opened syntax is only accessible in
the scope an Ocaml 'open' would normally make names available.

Thus, you could easily define 'macros' to help implement
modules without impacting other modules, even in the same
translation unit.

Whilst Scheme isn't the best language for processing Ocaml AST terms
or extensions to them, it has the advantage of being a complete,
well known, high power symbol processing languages. To use the system
you only need to learn a bit of Scheme plus your base AST mapping.

The technology appears different from MetaOcaml because the 
staging is localised, that is, the phasing is recursed bottom up 
not top down (but I may be wrong on that!) as well as using
a distinct meta-language which is of course unpleasant, particularly
the dynamic typing.

When Ocaml 3.11 is released I hope to extend the technology to
also support dynamic loading of semantic extensions. It's probable
these extensions will be plugged directly into Ocs_scheme as
primitives, so that the Ocaml based AST is discarded permanently.
[The whole compiler will become a Scheme program, albiet with 
very specialised primitives written in Ocaml]

Most term rewriting systems are 'effectively' dynamically typed,
even if they use some lower level static typing as support.
Rewrite rules have both pre- and post- conditions which can't
be represented with static typing, and excess static typing
appears to seriously impeded modification of the term rewriting
systems. These systems are 'interpreting' terms no matter
what -- that is, they're intrinsically dynamically typed.

Using s-expressions is a rather extreme encoding, and is 
basically TOO dynamic IMHO, but it does seem to represent
a good compromise when there is an emphasis on extension
and experiment. The point is that whilst a more static
encoding is safer for simpler extensions .. it really
gets in the way badly when the extension is a complex
function of an already complex term encoding.

It would be interesting if someone wanted to use this idea
to make a new camlp5 front end to Ocaml.

John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: