Version française
Home     About     Download     Resources     Contact us    
Browse thread
Camlp4's (lack of) hygiene (was Re: Macros)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: John Prevost <prevost@m...>
Subject: Camlp4's (lack of) hygiene (was Re: Macros)
The following message is a courtesy copy of an article
that has been posted to comp.lang.functional as well.

I'm also forwarding this message to the caml-list for reference.

{ Summary: No, it's not hygienic, which I wasn't aware of.  Have to
  bug people to fix that.  Detailed examples, and explanation of what
  ways camlp4 is semi-hygienic and very powerful.

  Summary of non-hygiene: It's hard not to capture variables in
  subexpressions, if you declare temporaries.  This should be fixed,
  and it's probably pretty doable.  Just make any variable reference
  in a quotation be somehow gensymmed or rename variables in
  subexpressions. }

>>>>> "mb" == Matthias Blume <see@my.sig> writes:
>>>>> "mm" == Markus Mottl <mottl@miss.wu-wien.ac.at> writes:

    mm> I haven't needed camlp4 so far, but it is a pretty powerful
    mm> tool: calling it a "preprocessor" is actually an
    mm> underestimation of its capabilities.

    mb> It "processes", and it does this before the compiler sees the
    mb> program, hence "pre".  That's a "pre-processor" to me.

    mb> Still, is its handling of syntax-trees hygienic with respect
    mb> to the Caml language?

    mm> You can transform abstract syntax trees of the language with
    mm> it and pretty print the result again.

    mb> M4, being Turing-complete (AFAIK), can do the same.  (Of
    mb> course, I am not saying that the result would be beautiful in
    mb> any sense of the word. :)

camlp4's pre-processing for ocaml works by using a parser with support
for extensible grammars.  Here's an example from the manual:

--camlp4-manual---------------------------------------------------------
3.5   Examples

3.5.1   Arithmetic calculator

This is an example of a grammar of arithmetic expressions: 

    let gram = Grammar.create (Plexer.make ());;
    let test = Grammar.Entry.create gram "expression";;
    let expr = Grammar.Entry.create gram "expression";;
    EXTEND
      test: [ [ e = expr; EOI -> e ] ];
      expr: [ "plus" LEFTA
              [ e1 = expr; "+"; e2 = expr -> e1 + e2
              | e1 = expr; "-"; e2 = expr -> e1 - e2 ]
            | "mult" LEFTA
              [ e1 = expr; "*"; e2 = expr -> e1 * e2
              | e1 = expr; "/"; e2 = expr -> e1 / e2 ]
            | [ e = INT -> int_of_string e
              | "("; e = expr; ")" -> e ] ];
    END;;
    let calc str =
      try Grammar.Entry.parse test (Stream.of_string str) with
        Stdpp.Exc_located (loc, e) ->
          Printf.printf "Located at (%d, %d)\n" (fst loc) (snd loc);
          raise e
    ;;

Now, an extension of the entry ``expr'' to add the modulo, could be:

    EXTEND expr:
      AFTER "mult" [ [ e1 = expr; "mod"; e2 = expr -> e1 mod e2 ] ];
    END;;
------------------------------------------------------------------------

the above is somewhat lex/yacc like.  It's possible to replace the
lexer as well.

This works with ocaml by producing results which are ASTs.  There are
camlp4 extensions used to write ASTs in a natural
real-caml-syntax-like way, but the underlying structure is a set of
strongly typed datastructures.  For ocaml front-end use, a byte
representation of the AST (not a pretty-printed source file) is then
output for the compiler to digest.

Because of the strong typing, it is at the very least not possible to
write an extension that outputs grammatically incorrect code.

This is at least a level of hygiene that isn't seen in most
preprocessors.  Again, extensions to the real parser, outputting a
type-safe AST.

Now, all of this is more than a little heavy.  Here's an example
adding repeat <expr> until <expr>, which does what you'd expect.

--camlp4-Manual---------------------------------------------------------
4.3.2   Repeat until à la Pascal

The ``repeat...until'' loop of Pascal is closed to the ``while'' loop
except that it is executed at least once. We can implement it like
this:

       open Pcaml;;
       EXTEND
         expr: LEVEL "let"
           [[ "repeat"; e1 = expr; "until"; e2 = expr ->
                 <:expr< do $e1$; return while not $e2$ do $e1$; done >> ]];
       END;;
------------------------------------------------------------------------

This says, essentially: "extend the current parser, add an entry at
the level named 'let' (same level as let expressions).  Match the
keyword "repeat", follwed by an expression, the keyword "until", and
another expression, and use the resulting AST.

The <:expr< ... >> is a "quotation", which is an extension added in
camlp4 to allow selectively escaping out into other languages.  The
$blah$ segments inside refer to variables in the ocaml code, rather
than the metalanguage code.  So let's compare the "when" hygienic
macro example from R5RS to the equivalent camlp4 extension:

You might wonder, at this point, about introducing new variables
inside the output without capturing--which, I now recall, is the
really key thing about hygienic macros.

Here's the example which shows (the lack of) that property:

--camlp4-manual---------------------------------------------------------
4.3.1   Infix

This is an example to add the infix operator ``o'', composition of two
functions. For the meaning of the quotation expr used here, see
appendix A.

       open Pcaml;;
       EXTEND
         expr: AFTER "apply"
           [[ f = expr; "o"; g = expr -> <:expr< fun x -> $f$ ($g$ x) >> ]];
       END;;
------------------------------------------------------------------------

You see here the $f$ and $g$ antiquoting out to get the values of f
and g.  We also see "x", which is not antiquoted.  Will this capture a
reference to x in f or g?  Let's see:

isil $ ocamlc -pp 'camlp4o ./infix.cmo' testi.ml    /home/prevost/src/caml/test
File "testi.ml", line 4, characters 23-28:
This expression has type 'a -> 'b but is here used with type 'a

isil $ camlp4o ./infix.cmo pr_o.cmo testi.ml        /home/prevost/src/caml/test
let a y = y + 1;;
let x y = y + 2;;

Printf.printf "%d\n" ((fun x -> a (x x)) 0);;

Survey says: yes, it does interfere.  Very disappointing--and I'll
have to ask about it on the caml list.


But, I will show a quick example of the same thing in hygienic macros
and in camlp4:

--R5RS------------------------------------------------------------------
(let-syntax ((when (syntax-rules ()
                           ((when test stmt1 stmt2 ...)
                            (if test
                                (begin stmt1
                                       stmt2 ...))))))
------------------------------------------------------------------------

Here's the camlp4 version:

--camlp4-example--------------------------------------------------------
open Pcaml

EXTEND
  expr: LEVEL "top"
    [[ "when"; e1 = expr; "do"; e2 = expr ->
	    <:expr< if $e1$ then $e2$ else () >> ]];
END
------------------------------------------------------------------------

(It turns out the documentation is a little out of date, and there's
no level named "let" any more.)

this then takes the program:

--camlp4-example-program------------------------------------------------
when true do print_string "test\n";;
------------------------------------------------------------------------

quite happily.  Of course, in O'Caml you can have ifs with no elses,
which makes it a but more pointless.

Now--why all this power?  Why not something simpler, like Scheme's
hygienic macros?  The reason is that you can do more powerful
manipulations.  As an example, I've seen camlp4 quotations for regular
expressions:

<:re<a*b*>>

or things which hoist constant expressions up to the top level so
they're only executed once.  This kind of power is good.


So: hygienic?  No.  But that can be fixed.  Powerful?  Yes.  In
actuality, I think you can actually avoid most collisions by using let
and:

       open Pcaml;;
       EXTEND
         expr: AFTER "apply"
           [[ f = expr; "o"; g = expr -> <:expr< let f = $f$ and g = $g$ in
                                                    fun x -> f (g x) >> ]];
       END;;

except, of course, that this isn't the first obvious thing to do, and
this is not something that will work in all cases (i.e. the "my-or"
hygienic macro).  Not only that, but there's no "gensym" function
here (at least none I know of.)

Oh--actually, this will in fact work in every case, if you make the
following transformation:

... <:expr< let f () = $f$ and g () = $g$ in fun x -> f () (g () x) >>

silly example, but it shows how you could use thunks to always avoid
capturing.  Again, not immediately obvious that you need to.

So in any case--I'll bring this up with the developers.  I wasn't
aware the problem was there.

John.