Re: re-entrance of ocamlyacc parsers

From: Thierry Bravier (thierry.bravier@dassault-aviation.fr)
Date: Fri Jun 06 1997 - 13:48:28 MET DST


Message-Id: <3397F90C.140F@dassault-aviation.fr>
Date: Fri, 06 Jun 1997 13:48:28 +0200
From: Thierry Bravier <thierry.bravier@dassault-aviation.fr>
To: caml-list@inria.fr
Subject: Re: re-entrance of ocamlyacc parsers

Thierry Bravier wrote:
>
> I would like to be able to call the ocamlyacc-generated code
> recursively from an ocamlyacc semantic action.
>
> This would be helpful, for instance, in the case of an
> external module interface import clause as in :
>
> phrase:
> | ...
> | IMPORT IDENT SEMI
> { Import (compile_file $2) }
>

RESUME FRANCAIS:
Chers utilisateurs d'ocaml,

J'ai ecrit un script Ocamlyacc en perl.

Il s'agit d'une extension ad-hoc d'ocamlyacc pour
permettre l'ecriture d'analyseurs syntaxiques
contextuels et re-entrants.

Ocamlyacc est grandement compatible avec ocamlyacc.

Un court exemple avec gestion d'erreur est fourni plus loin.

Tous les commentaires sont bienvenus.

ENGLISH VERSION:
Dear ocamlers,

I wrote a small Ocamlyacc perl-script.

This script is just a slight ad-hoc extension of the original ocamlyacc.
(I did not modify ocamlyacc directly because of many undocumented features
I had rather not take time to understand)

Ocamlyacc works exactly as ocamlyacc (see -v and -bprefix options)

The generated parsing functions (see %start) are mutually recursive.

Ocamlyacc also recognizes a new clause:
        %env <env_type> env_name
which is optional and adds a parameter to the generated re-entrant parser,
this can be useful to parse file in a context (see example below).

As I had feared, the existing stdling/parsing module
may not be used as it is to write a re-entrant ocamlyacc parser.

This is mainly because it uses a global value `env : parser_env'.

As a consequence I modified it (superficially) to transform the
global value into a value local to each parsing function.

The only compatibility problem (with original `parsing' interface)
is that a few auxiliary functions have a different type in the
new version or `parsing' (for instance symbol_start : parser_env -> int
instead of unit -> int).

These auxiliary functions are to be called in semantic actions
with a first `parse_env' parameter (a la `lexbuf' ocamllex parameter).

This is, I hope, not much to cope with.

Error handling also works correctly (see file p.mly bellow).

All comments are welcome.

In case Ocamlyacc seems useful, I would appreciate if it were
incorporated into ocamlyacc.

Thierry Bravier Dassault Aviation - DGT / DTN / ELO / EAV
78, Quai Marcel Dassault F-92214 Saint-Cloud Cedex - France
Telephone : (33) 01 47 11 53 07 Telecopie : (33) 01 47 11 52 83
E-Mail : mailto:thierry.bravier@dassault-aviation.fr

==> Makefile <==
#==============================================================================
# File: Makefile
# Language: make
# Author: Thierry Bravier
# Time-stamp: <97/06/06 10:16:14 tb>
# Created: 97/06/06 09:59:23 tb
#==============================================================================
OCAMLC=ocamlc -c $(OCAMLC_FLAGS)
PARSING=parsing

all: main

OBJ=s.cmo $(PARSING).cmo p.cmo l.cmo

main: $(OBJ)
        ocamlc -o $@ $(OBJ)

clean:
        -rm *~ *.cm* p.ml p.mli l.ml

p.ml p.mli: p.mly
        Ocamlyacc -v p.mly

l.ml: l.mll
        ocamllex l.mll

s.cmo s.cmi: s.ml
        $(OCAMLC) s.ml

$(PARSING).cmi: $(PARSING).mli
        $(OCAMLC) $(PARSING).mli

$(PARSING).cmo: $(PARSING).ml $(PARSING).cmi
        $(OCAMLC) $(PARSING).ml

p.cmi: p.mli s.cmi
        $(OCAMLC) p.mli

p.cmo: p.ml p.cmi s.cmi $(PARSING).cmi
        $(OCAMLC) p.ml

l.cmo: l.ml s.cmi p.cmi
        $(OCAMLC) l.ml

#==============================================================================

==> Ocamlyacc <==
#!/shell/perl
#==============================================================================
# File: Ocamlyacc
# Language: perl
# Author: Thierry Bravier
# Time-stamp: <97/06/06 09:58:31 tb>
# Created: 97/06/06 09:55:53 tb
#==============================================================================
eval 'exec perl -S $0 "$@"'
    if $running_under_some_shell;
                        # this emulates #! processing on NIH machines.

#==============================================================================
$0 =~ s|^.*/||;
sub usage {
    my ($message) = @_;
    print STDERR "usage: $0 [-v] [-bprefix] grammar.mly\n";
    print STDERR "error: $message\n";
    exit 1;
}

#==============================================================================
my $grammar = undef;
my $verbose = 0;
my $default_prefix = 'parser';
my $prefix = undef;

{
    my $arg;
    while ($arg = shift) {
        $arg eq '-v' && do {
            $verbose && &usage ("multiple -v");
            $verbose = $arg;
            next;
        };
        $arg =~ /\A-b(.+)\Z/ && do {
            $prefix && &usage ("multiple -bprefix");
            $prefix = $1;
            next;
        };
        {
            $grammar && &usage ("multiple grammar.mly");
            $grammar = $arg;
            next;
        }
    }
    if ($grammar) {
        my ($dirname, $basename);
        (($dirname, $basename) = ($grammar =~ m|\A(.*/)?([^/]+)\.mly\Z|))
            || &usage ("incorrect grammar.mly");
        $prefix || do { $prefix = $basename; };
    } else {
        $grammar = '-';
        $prefix || do { $prefix = $default_prefix; };
    }
}

my $tmp_dir = "/tmp";
my $tmp_prefix = "${0}_$$";
my $tmp_grammar = "$tmp_dir/$tmp_prefix.mly";
my $command = "cd $tmp_dir;"
    . "ocamlyacc $verbose -b$tmp_prefix $tmp_grammar";

my $body = "$prefix.ml";
my $interface = "$prefix.mli";

my $tmp_body = "$tmp_dir/$tmp_prefix.ml";
my $tmp_interface = "$tmp_dir/$tmp_prefix.mli";

#==============================================================================
my $env_found = 0;
my $env_type = undef;
my $env_name = undef;

my @starts = ();

{
    open (GRAMMAR, "<$grammar")
        || die ("$0: cannot open '$grammar': $!\n");
    open (TMP_GRAMMAR, ">$tmp_grammar")
        || die ("$0: cannot open '$tmp_grammar': $!\n");
    while (<GRAMMAR>) {
        my ($type, $name);
        (($type, $name) = /\A%env\s*<(.*)>\s*(\w+)\s*\Z/) && do {
            $env_found &&
                die ("$0: syntax error \"$grammar\":$. : multiple %env\n");
            $env_type = $type;
            $env_name = $name;
            $env_found = 1;
            chop $_;
            $_ = "/* $_ */";
        };
        print TMP_GRAMMAR;
    }
    close TMP_GRAMMAR;
    close GRAMMAR;

    system $command;
    $? != 0 && exit 1;

    unlink $tmp_grammar;
}

#==============================================================================
{
    open (TMP_INTERFACE, "<$tmp_interface")
        || die ("$0: cannot open '$tmp_interface': $!\n");
    open (INTERFACE, ">$interface")
        || die ("$0: cannot open '$interface': $!\n");
    while (<TMP_INTERFACE>) {
        /\Aval (\w+) :\Z/ && do {
            push (@starts, $1);
            if ($env_found) {
                chop $_;
                $_ = "$_ ($env_type) ->\n";
            }
        };
        print INTERFACE;
    }
    close INTERFACE;
    close TMP_INTERFACE;

    unlink $tmp_interface;
}

#==============================================================================
{
    open (TMP_BODY, "<$tmp_body")
        || die ("$0: cannot open '$tmp_body': $!\n");
    open (BODY, ">$body")
        || die ("$0: cannot open '$body': $!\n");

    my @let_rec = ( 'let rec', 'and' );
    my $let_rec = 0;

    my $starts;
    if ($env_found) {
        $starts = '(' . join (', ', (@starts, $env_name)) . ')';
    } else {
        $starts = '(' . join (', ', (@starts)) . ')';
    }

    while (<TMP_BODY>) {
        s|(\Q$tmp_dir\E/)?\Q$tmp_prefix\E|$prefix|;

        s/let yyact =/let yyact $starts =/;
        s/let yytables =/let yytables starts =/;
        s/actions=yyact;/actions=yyact starts;/;
        s/let (\w+) lexfun lexbuf = yyparse yytables (\d+) lexfun
lexbuf/$let_rec[$let_rec++ != 0] $1 $env_name lexfun lexbuf = yyparse (yytables
$starts) $2 lexfun lexbuf/;
        print BODY;
    }
    close BODY;
    close TMP_BODY;

    unlink $tmp_body;
}

#==============================================================================

==> parsing.mli <==
(***********************************************************************)
(* *)
(* Objective Caml *)
(* *)
(* Xavier Leroy, projet Cristal, INRIA Rocquencourt *)
(* *)
(* Copyright 1996 Institut National de Recherche en Informatique et *)
(* Automatique. Distributed only by permission. *)
(* *)
(***********************************************************************)

(* $Id: parsing.mli,v 1.8 1996/04/30 14:50:23 xleroy Exp $ *)

(* Module [Parsing]: the run-time library for parsers generated by [camlyacc]*)

exception Parse_error
        (* Raised when a parser encounters a syntax error.
           Can also be raised from the action part of a grammar rule,
           to initiate error recovery. *)

type parser_env

val symbol_start : parser_env -> int
val symbol_end : parser_env -> int
        (* [symbol_start] and [symbol_end] are to be called in the action part
           of a grammar rule only. They return the position of the string that
           matches the left-hand side of the rule: [symbol_start()] returns
           the position of the first character; [symbol_end()] returns the
           position of the last character, plus one. The first character
           in a file is at position 0. *)
val rhs_start: parser_env -> int -> int
val rhs_end: parser_env -> int -> int
        (* Same as [symbol_start] and [symbol_end], but return the
           position of the string matching the [n]th item on the
           right-hand side of the rule, where [n] is the integer parameter
           to [lhs_start] and [lhs_end]. [n] is 1 for the leftmost item. *)

val clear_parser : parser_env -> unit
        (* Empty the parser stack. Call it just after a parsing function
           has returned, to remove all pointers from the parser stack
           to structures that were built by semantic actions during parsing.
           This is optional, but lowers the memory requirements of the
           programs. *)

(*--*)

(* The following definitions are used by the generated parsers only.
   They are not intended to be used by user programs. *)

type parse_tables =
  { actions : (parser_env -> Obj.t) array;
    transl_const : int array;
    transl_block : int array;
    lhs : string;
    len : string;
    defred : string;
    dgoto : string;
    sindex : string;
    rindex : string;
    gindex : string;
    tablesize : int;
    table : string;
    check : string;
    error_function : string -> unit }

exception YYexit of Obj.t

val yyparse :
      parse_tables -> int -> (Lexing.lexbuf -> 'a) -> Lexing.lexbuf -> 'b
val peek_val : parser_env -> int -> 'a
(* PATCH what is the use of this ???
val is_current_lookahead : 'a -> bool
*)
val parse_error : string -> unit

==> parsing.ml <==
(***********************************************************************)
(* *)
(* Objective Caml *)
(* *)
(* Xavier Leroy, projet Cristal, INRIA Rocquencourt *)
(* *)
(* Copyright 1996 Institut National de Recherche en Informatique et *)
(* Automatique. Distributed only by permission. *)
(* *)
(***********************************************************************)

(* $Id: parsing.ml,v 1.9 1997/02/25 14:41:16 xleroy Exp $ *)

(* The parsing engine *)

open Lexing

(* Internal interface to the parsing engine *)

type parser_env =
  { mutable s_stack : int array; (* States *)
    mutable v_stack : Obj.t array; (* Semantic attributes *)
    mutable symb_start_stack : int array; (* Start positions *)
    mutable symb_end_stack : int array; (* End positions *)
    mutable stacksize : int; (* Size of the stacks *)
    mutable stackbase : int; (* Base sp for current parse *)
    mutable curr_char : int; (* Last token read *)
    mutable lval : Obj.t; (* Its semantic attribute *)
    mutable symb_start : int; (* Start pos. of the current symbol*)
    mutable symb_end : int; (* End pos. of the current symbol *)
    mutable asp : int; (* The stack pointer for attributes *)
    mutable rule_len : int; (* Number of rhs items in the rule *)
    mutable rule_number : int; (* Rule number to reduce by *)
    mutable sp : int; (* Saved sp for parse_engine *)
    mutable state : int; (* Saved state for parse_engine *)
    mutable errflag : int } (* Saved error flag for parse_engine *)

type parse_tables =
  { actions : (parser_env -> Obj.t) array;
    transl_const : int array;
    transl_block : int array;
    lhs : string;
    len : string;
    defred : string;
    dgoto : string;
    sindex : string;
    rindex : string;
    gindex : string;
    tablesize : int;
    table : string;
    check : string;
    error_function : string -> unit }

exception YYexit of Obj.t
exception Parse_error

type parser_input =
    Start
  | Token_read
  | Stacks_grown_1
  | Stacks_grown_2
  | Semantic_action_computed
  | Error_detected

type parser_output =
    Read_token
  | Raise_parse_error
  | Grow_stacks_1
  | Grow_stacks_2
  | Compute_semantic_action
  | Call_error_function

external parse_engine :
    parse_tables -> parser_env -> parser_input -> Obj.t -> parser_output
    = "parse_engine"

let make_env () =
  { s_stack = Array.create 100 0;
    v_stack = Array.create 100 (Obj.repr ());
    symb_start_stack = Array.create 100 0;
    symb_end_stack = Array.create 100 0;
    stacksize = 100;
    stackbase = 0;
    curr_char = 0;
    lval = Obj.repr ();
    symb_start = 0;
    symb_end = 0;
    asp = 0;
    rule_len = 0;
    rule_number = 0;
    sp = 0;
    state = 0;
    errflag = 0 }

let grow_stacks env =
  let oldsize = env.stacksize in
  let newsize = oldsize * 2 in
  let new_s = Array.create newsize 0
  and new_v = Array.create newsize (Obj.repr ())
  and new_start = Array.create newsize 0
  and new_end = Array.create newsize 0 in
    Array.blit env.s_stack 0 new_s 0 oldsize;
    env.s_stack <- new_s;
    Array.blit env.v_stack 0 new_v 0 oldsize;
    env.v_stack <- new_v;
    Array.blit env.symb_start_stack 0 new_start 0 oldsize;
    env.symb_start_stack <- new_start;
    Array.blit env.symb_end_stack 0 new_end 0 oldsize;
    env.symb_end_stack <- new_end;
    env.stacksize <- newsize

let clear_parser env =
  Array.fill env.v_stack 0 env.stacksize (Obj.repr ());
  env.lval <- Obj.repr ()

(* PATCH what is the use of this ???
let current_lookahead_fun = ref (fun (x: Obj.t) -> false)
*)

let yyparse tables start lexer lexbuf =
  let env = make_env () in
  let rec loop cmd arg =
    match parse_engine tables env cmd arg with
      Read_token ->
        let t = Obj.repr(lexer lexbuf) in
        env.symb_start <- lexbuf.lex_abs_pos + lexbuf.lex_start_pos;
        env.symb_end <- lexbuf.lex_abs_pos + lexbuf.lex_curr_pos;
        loop Token_read t
    | Raise_parse_error ->
        raise Parse_error
    | Compute_semantic_action ->
        let (action, value) =
          try
            (Semantic_action_computed, tables.actions.(env.rule_number) env)
          with Parse_error ->
            (Error_detected, Obj.repr ()) in
        loop action value
    | Grow_stacks_1 ->
        grow_stacks env; loop Stacks_grown_1 (Obj.repr ())
    | Grow_stacks_2 ->
        grow_stacks env; loop Stacks_grown_2 (Obj.repr ())
    | Call_error_function ->
        tables.error_function "syntax error";
        loop Error_detected (Obj.repr ()) in
  let init_asp = env.asp
  and init_sp = env.sp
  and init_state = env.state
  and init_curr_char = env.curr_char in
  env.stackbase <- env.sp + 1;
  env.curr_char <- start;
  try
    loop Start (Obj.repr ())
  with exn ->
    let curr_char = env.curr_char in
    env.asp <- init_asp;
    env.sp <- init_sp;
    env.state <- init_state;
    env.curr_char <- init_curr_char;
    match exn with
      YYexit v ->
        Obj.magic v
    | _ ->
(* PATCH what is the use of this ???
        current_lookahead_fun :=
          (fun tok ->
            if Obj.is_block tok
            then tables.transl_block.(Obj.tag tok) = curr_char
            else tables.transl_const.(Obj.magic tok) = curr_char);
*)
        raise exn

let peek_val env n =
  Obj.magic env.v_stack.(env.asp - n)

let symbol_start env =
  if env.rule_len > 0
  then env.symb_start_stack.(env.asp - env.rule_len + 1)
  else env.symb_end_stack.(env.asp)
let symbol_end env =
  env.symb_end_stack.(env.asp)

let rhs_start env n =
  env.symb_start_stack.(env.asp - (env.rule_len - n))
let rhs_end env n =
  env.symb_end_stack.(env.asp - (env.rule_len - n))

(* PATCH what is the use of this ???
let is_current_lookahead tok =
  (!current_lookahead_fun)(Obj.repr tok)
*)

let parse_error (msg: string) = ()

==> s.ml <==
(* ============================================================================
 * File: s.ml
 * Language: caml
 * Author: Thierry Bravier
 * Time-stamp: <97/06/06 10:00:45 tb>
 * Created: 97/06/06 10:00:38 tb
 * ========================================================================= *)
open Printf

type filename = string

type syntax =
  | Error of filename * int * int
  | Import of filename * syntax list
  | Assignment of string * int

let generic_parse_file parse lex filename =
  let in_channel = open_in filename in
  let result = parse lex (Lexing.from_channel in_channel) in
  close_in in_channel;
  result

exception Cyclic of filename

let rec check_acyclic filename = function
  | [] -> ()
  | (hd :: tl) ->
      if filename = hd then raise (Cyclic filename)
      else check_acyclic filename tl

let parse_imported_file parse (filenames, lex) filename =
  check_acyclic filename filenames;
  generic_parse_file
    (parse (filename :: filenames, lex))
    lex
    filename

type 'token lexfun = Lexing.lexbuf -> 'token

let rec fprint_path out = function
  | [] -> ()
  | [a] -> fprintf out "%s" a
  | (a :: b) -> fprintf out "%a.%s" fprint_path b a

let rec fprint_indent out = function
  | [] -> ()
  | (a :: b) -> fprintf out "\t%a" fprint_indent b

let rec fprint_syntax_path path out = function
  | Error (filename, s, e) ->
      fprintf out "-- SKIPPED ERROR at \"%s\":%d-%d" filename s e
  | Import (filename, syntaxes) ->
      let abs = filename :: path in
      fprintf out "%a-- begin of %a\n" fprint_indent path fprint_path abs;
      fprint_syntaxes_path abs out syntaxes;
      fprintf out "%a-- end of %a" fprint_indent path fprint_path abs
  | Assignment (name, int) ->
      fprintf out "%a%a = %d;" fprint_indent path
        fprint_path (name :: path) int
and fprint_syntaxes_path path out =
  List.iter (fprintf out "%a\n" (fprint_syntax_path path))
and fprint_syntax out =
  fprint_syntax_path [] out
and fprint_syntaxes out =
  fprint_syntaxes_path [] out

(* ========================================================================= *)

==> p.mly <==
/* ============================================================================
 * File: p.mly
 * Language: camlyacc
 * Author: Thierry Bravier
 * Time-stamp: <97/06/06 10:03:41 tb>
 * Created: 97/06/06 10:01:15 tb
 * ========================================================================= */
%{
open S

let where_am_i parser_env = function
  | ((filename :: _), _) ->
      (filename, symbol_start parser_env + 1, symbol_end parser_env + 1)
  | _ -> failwith "incorrect env"

let where_is_error parser_env env =
  let ((filename, s, e) as result) = where_am_i parser_env env in
  Printf.fprintf stderr
    "File \"%s\", line %03d, characters %03d-%03d:\n" filename 0 s e;
  result
%}

/* ========================================================================= */
%token <string> IDENT
%token <int> INT
%token IMPORT /* "import" */
%token EQUAL /* "=" */
%token SEMI /* ";" */
%token EOF /* eof */
%token ERROR /* others */

/* ========================================================================= */
%env <S.filename list * token S.lexfun> env

%start yacc_start_1
%type <S.syntax list> yacc_start_1

%start yacc_start_2
%type <string list> yacc_start_2

%%
/* ========================================================================= */
yacc_start_1 :
| phrase_1 yacc_start_1
    { $1 :: $2 }
| EOF
    { [] }
;

phrase_1 :
| error SEMI
    { let (filename, s, e) = where_is_error parser_env env in
    Printf.fprintf stderr "ERROR\n";
    Error (filename, s, e) }
| IMPORT IDENT SEMI
    { Import ($2, parse_imported_file yacc_start_1 env $2) }
| IDENT INT error SEMI
    { let (filename, s, e) = where_is_error parser_env env in
    Printf.fprintf stderr "ERROR: YOU MEANT \"%s = %d;\"\n" $1 $2;
    Error (filename, s, e) }
| IDENT EQUAL INT SEMI
    { Assignment ($1, $3) }

/* ------------------------------------------------------------------------- */
yacc_start_2 :
| IDENT SEMI yacc_start_2
    { $1 :: $3 }
| EOF
    { [] }
;

/* ========================================================================= */
%%

(* ========================================================================= *)

==> l.mll <==
(* ============================================================================
 * File: l.mll
 * Language: camllex
 * Author: Thierry Bravier
 * Time-stamp: <97/06/06 10:00:03 tb>
 * Created: 97/06/06 09:59:45 tb
 * ========================================================================= *)
{
open S
open P

open Printf
}

(* ========================================================================= *)
rule lexer = parse
  [' ' '\012' '\013' '\009' '\026' '\010'] + { lexer lexbuf }
| "--" [^ '\n' ] * { lexer lexbuf }
| ['A'-'Z' 'a'-'z'] ('_' ? ['A'-'Z' 'a'-'z' '0'-'9']) *
    { match Lexing.lexeme lexbuf with "import" -> IMPORT | s -> IDENT s }
| ['1'-'9'] ['0'-'9']*
    { INT (int_of_string (Lexing.lexeme lexbuf)) }
| ";" { SEMI }
| "=" { EQUAL }
| eof { EOF }
| _ { ERROR }

(* ========================================================================= *)
{
let yacc_start_1_file = parse_imported_file yacc_start_1 ([], lexer)

let argl () = match Array.to_list (Sys.argv) with
  | (prgm :: args) -> (prgm, args)
  | _ -> failwith "argl"

let _ =
  match argl () with
  | prgm, [ filename ] ->
      fprint_syntaxes stdout (yacc_start_1_file filename);
      ()
  | prgm, _ ->
      fprintf stderr "usage: %s file\n" prgm;
      exit 1
}

(* ========================================================================= *)

==> bar_syntax <==
x = 7;
y 8;
import gee_syntax;
z = 9;

==> foo_syntax <==
a = 5;
b = 99;
import bar_syntax;
c = 100;
d = 200;
e = 300;

==> gee_syntax <==
t 6;
u = 93;
==> OUTPUT <==
./main foo_syntax
a = 5;
b = 99;
-- begin of bar_syntax
        bar_syntax.x = 7;
-- SKIPPED ERROR at "bar_syntax":8-12
        -- begin of bar_syntax.gee_syntax
-- SKIPPED ERROR at "gee_syntax":1-5
                bar_syntax.gee_syntax.u = 93;
        -- end of bar_syntax.gee_syntax
        bar_syntax.z = 9;
-- end of bar_syntax
c = 100;
d = 200;
e = 300;
File "bar_syntax", line 000, characters 008-012:
ERROR: YOU MEANT "y = 8;"
File "gee_syntax", line 000, characters 001-005:
ERROR: YOU MEANT "t = 6;"

-- 
Thierry Bravier                Dassault Aviation - DGT / DTN / ELO / EAV
78, Quai Marcel Dassault              F-92214 Saint-Cloud Cedex - France
Telephone : (33) 01 47 11 53 07          Telecopie : (33) 01 47 11 52 83
E-Mail :                     mailto:thierry.bravier@dassault-aviation.fr



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:11 MET