Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] On ocamlyacc and ocamllex
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Vesa Karvonen <vesa.karvonen@h...>
Subject: Re: [Caml-list] On ocamlyacc and ocamllex
From: "Christian Lindig" <lindig@eecs.harvard.edu>
> On Sun, Sep 23, 2001 at 12:09:14AM +0300, Vesa Karvonen wrote:
> > I would like to make it so that the lexer would record the positions
> > of line breaks so that I could directly give line number and column
> > information in error messages.
[snip snip]
> The particular problem can be solved outside of Lex and Yacc: in the
> Quick C-- compiler we have a mutable Sourcemap.map data type that
> records the connection between character positions and
> (file,line,column) triples.

This is basically the same technique that I have been using. The problem is
that the map has to be global, because the only context passed to the lexer
actions is the lexbuf. Furthermore, the records need to be manually removed
(in order to save memory) after a file has been processed completely and the
recorded connections for the file are no longer needed.

An extendable lexer makes it possible to extend the context passed to the
lexer actions so that globals can be avoided.

> I agree that more flexible lexer and parser generators would be nice and
> have myself lobbied for them in the past. On the other hand I have
> always found my way with the existing ones which probably is the reason
> that we still use them.

Replacing the Lex and Yacc modules turned out to be simpler than I thought.
I'm almost done with writing replacements for the Lexing and Parsing modules.
I have written replacement modules called Lex and Yacc. The Lex module defines
an abstract parameterized type lexbuf like this:

    type 't lexbuf
    val access : 't lexbuf -> 't
    val from_channel : in_channel -> 't -> 't lexbuf
    ...

It is now possible to make a simple module for tracking line numbers:

    type t
    val make : unit -> t
    val new_line_at_pos : t -> int -> unit
    val line_and_col_of_pos : t -> int -> int * int

And then extend the lexbuf with the line map:

    val from_channel : in_channel -> Line_map.t Lex.lexbuf
    val new_line : Line_map.t Lex.lexbuf -> unit
    ...

and use those functions in the lexer actions:

    '\n' { new_line lexbuf; token lexbuf; }
    ...

I have made it so that the ocamlyacc and ocalmlex generated files go through
sed commands which change the generated files to work with the Lex and Yacc
modules instead of the Lexing and Parsing modules.

> > Another issue with ocamllex and ocamlyacc (and lex/flex and
> > yacc/bison) is that the dependencies between the generated lexer and
> > parser are not quite optimal. Currently the generated lexer is
> > dependent on the parser, because the parser generates the token type.
> > This means that each time the grammar is modified, but not the token
> > definitions, the lexer is recompiled. This could be avoided by making
> > it so that the token type is defined in a separate module.
>
> This is a general problem with make: when you edit a comment, a file is
> touched and all dependent files must be recompiled.
[...]

I think that you slightly misunderstood.

The basic idea was to put the token type definition into a separate module.
Instead of two source files, you would have three source files:

    lexer.mll
    token.ml
    parser.mly

The token definition is now effectively demoted into its own module which is
now dependent upon by the lexer and parser modules.

In parser.mly there would be code that would tell ocamlyacc to look at
token.ml for the token type.


-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr