Message-Id: <3397F90C.140F@dassault-aviation.fr>
Date: Fri, 06 Jun 1997 13:48:28 +0200
From: Thierry Bravier <thierry.bravier@dassault-aviation.fr>
To: caml-list@inria.fr
Subject: Re: re-entrance of ocamlyacc parsers
Thierry Bravier wrote:
>
> I would like to be able to call the ocamlyacc-generated code
> recursively from an ocamlyacc semantic action.
>
> This would be helpful, for instance, in the case of an
> external module interface import clause as in :
>
> phrase:
> | ...
> | IMPORT IDENT SEMI
> { Import (compile_file $2) }
>
RESUME FRANCAIS:
Chers utilisateurs d'ocaml,
J'ai ecrit un script Ocamlyacc en perl.
Il s'agit d'une extension ad-hoc d'ocamlyacc pour
permettre l'ecriture d'analyseurs syntaxiques
contextuels et re-entrants.
Ocamlyacc est grandement compatible avec ocamlyacc.
Un court exemple avec gestion d'erreur est fourni plus loin.
Tous les commentaires sont bienvenus.
ENGLISH VERSION:
Dear ocamlers,
I wrote a small Ocamlyacc perl-script.
This script is just a slight ad-hoc extension of the original ocamlyacc.
(I did not modify ocamlyacc directly because of many undocumented features
I had rather not take time to understand)
Ocamlyacc works exactly as ocamlyacc (see -v and -bprefix options)
The generated parsing functions (see %start) are mutually recursive.
Ocamlyacc also recognizes a new clause:
%env <env_type> env_name
which is optional and adds a parameter to the generated re-entrant parser,
this can be useful to parse file in a context (see example below).
As I had feared, the existing stdling/parsing module
may not be used as it is to write a re-entrant ocamlyacc parser.
This is mainly because it uses a global value `env : parser_env'.
As a consequence I modified it (superficially) to transform the
global value into a value local to each parsing function.
The only compatibility problem (with original `parsing' interface)
is that a few auxiliary functions have a different type in the
new version or `parsing' (for instance symbol_start : parser_env -> int
instead of unit -> int).
These auxiliary functions are to be called in semantic actions
with a first `parse_env' parameter (a la `lexbuf' ocamllex parameter).
This is, I hope, not much to cope with.
Error handling also works correctly (see file p.mly bellow).
All comments are welcome.
In case Ocamlyacc seems useful, I would appreciate if it were
incorporated into ocamlyacc.
Thierry Bravier Dassault Aviation - DGT / DTN / ELO / EAV
78, Quai Marcel Dassault F-92214 Saint-Cloud Cedex - France
Telephone : (33) 01 47 11 53 07 Telecopie : (33) 01 47 11 52 83
E-Mail : mailto:thierry.bravier@dassault-aviation.fr
==> Makefile <==
#==============================================================================
# File: Makefile
# Language: make
# Author: Thierry Bravier
# Time-stamp: <97/06/06 10:16:14 tb>
# Created: 97/06/06 09:59:23 tb
#==============================================================================
OCAMLC=ocamlc -c $(OCAMLC_FLAGS)
PARSING=parsing
all: main
OBJ=s.cmo $(PARSING).cmo p.cmo l.cmo
main: $(OBJ)
ocamlc -o $@ $(OBJ)
clean:
-rm *~ *.cm* p.ml p.mli l.ml
p.ml p.mli: p.mly
Ocamlyacc -v p.mly
l.ml: l.mll
ocamllex l.mll
s.cmo s.cmi: s.ml
$(OCAMLC) s.ml
$(PARSING).cmi: $(PARSING).mli
$(OCAMLC) $(PARSING).mli
$(PARSING).cmo: $(PARSING).ml $(PARSING).cmi
$(OCAMLC) $(PARSING).ml
p.cmi: p.mli s.cmi
$(OCAMLC) p.mli
p.cmo: p.ml p.cmi s.cmi $(PARSING).cmi
$(OCAMLC) p.ml
l.cmo: l.ml s.cmi p.cmi
$(OCAMLC) l.ml
#==============================================================================
==> Ocamlyacc <==
#!/shell/perl
#==============================================================================
# File: Ocamlyacc
# Language: perl
# Author: Thierry Bravier
# Time-stamp: <97/06/06 09:58:31 tb>
# Created: 97/06/06 09:55:53 tb
#==============================================================================
eval 'exec perl -S $0 "$@"'
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
#==============================================================================
$0 =~ s|^.*/||;
sub usage {
my ($message) = @_;
print STDERR "usage: $0 [-v] [-bprefix] grammar.mly\n";
print STDERR "error: $message\n";
exit 1;
}
#==============================================================================
my $grammar = undef;
my $verbose = 0;
my $default_prefix = 'parser';
my $prefix = undef;
{
my $arg;
while ($arg = shift) {
$arg eq '-v' && do {
$verbose && &usage ("multiple -v");
$verbose = $arg;
next;
};
$arg =~ /\A-b(.+)\Z/ && do {
$prefix && &usage ("multiple -bprefix");
$prefix = $1;
next;
};
{
$grammar && &usage ("multiple grammar.mly");
$grammar = $arg;
next;
}
}
if ($grammar) {
my ($dirname, $basename);
(($dirname, $basename) = ($grammar =~ m|\A(.*/)?([^/]+)\.mly\Z|))
|| &usage ("incorrect grammar.mly");
$prefix || do { $prefix = $basename; };
} else {
$grammar = '-';
$prefix || do { $prefix = $default_prefix; };
}
}
my $tmp_dir = "/tmp";
my $tmp_prefix = "${0}_$$";
my $tmp_grammar = "$tmp_dir/$tmp_prefix.mly";
my $command = "cd $tmp_dir;"
. "ocamlyacc $verbose -b$tmp_prefix $tmp_grammar";
my $body = "$prefix.ml";
my $interface = "$prefix.mli";
my $tmp_body = "$tmp_dir/$tmp_prefix.ml";
my $tmp_interface = "$tmp_dir/$tmp_prefix.mli";
#==============================================================================
my $env_found = 0;
my $env_type = undef;
my $env_name = undef;
my @starts = ();
{
open (GRAMMAR, "<$grammar")
|| die ("$0: cannot open '$grammar': $!\n");
open (TMP_GRAMMAR, ">$tmp_grammar")
|| die ("$0: cannot open '$tmp_grammar': $!\n");
while (<GRAMMAR>) {
my ($type, $name);
(($type, $name) = /\A%env\s*<(.*)>\s*(\w+)\s*\Z/) && do {
$env_found &&
die ("$0: syntax error \"$grammar\":$. : multiple %env\n");
$env_type = $type;
$env_name = $name;
$env_found = 1;
chop $_;
$_ = "/* $_ */";
};
print TMP_GRAMMAR;
}
close TMP_GRAMMAR;
close GRAMMAR;
system $command;
$? != 0 && exit 1;
unlink $tmp_grammar;
}
#==============================================================================
{
open (TMP_INTERFACE, "<$tmp_interface")
|| die ("$0: cannot open '$tmp_interface': $!\n");
open (INTERFACE, ">$interface")
|| die ("$0: cannot open '$interface': $!\n");
while (<TMP_INTERFACE>) {
/\Aval (\w+) :\Z/ && do {
push (@starts, $1);
if ($env_found) {
chop $_;
$_ = "$_ ($env_type) ->\n";
}
};
print INTERFACE;
}
close INTERFACE;
close TMP_INTERFACE;
unlink $tmp_interface;
}
#==============================================================================
{
open (TMP_BODY, "<$tmp_body")
|| die ("$0: cannot open '$tmp_body': $!\n");
open (BODY, ">$body")
|| die ("$0: cannot open '$body': $!\n");
my @let_rec = ( 'let rec', 'and' );
my $let_rec = 0;
my $starts;
if ($env_found) {
$starts = '(' . join (', ', (@starts, $env_name)) . ')';
} else {
$starts = '(' . join (', ', (@starts)) . ')';
}
while (<TMP_BODY>) {
s|(\Q$tmp_dir\E/)?\Q$tmp_prefix\E|$prefix|;
s/let yyact =/let yyact $starts =/;
s/let yytables =/let yytables starts =/;
s/actions=yyact;/actions=yyact starts;/;
s/let (\w+) lexfun lexbuf = yyparse yytables (\d+) lexfun
lexbuf/$let_rec[$let_rec++ != 0] $1 $env_name lexfun lexbuf = yyparse (yytables
$starts) $2 lexfun lexbuf/;
print BODY;
}
close BODY;
close TMP_BODY;
unlink $tmp_body;
}
#==============================================================================
==> parsing.mli <==
(***********************************************************************)
(* *)
(* Objective Caml *)
(* *)
(* Xavier Leroy, projet Cristal, INRIA Rocquencourt *)
(* *)
(* Copyright 1996 Institut National de Recherche en Informatique et *)
(* Automatique. Distributed only by permission. *)
(* *)
(***********************************************************************)
(* $Id: parsing.mli,v 1.8 1996/04/30 14:50:23 xleroy Exp $ *)
(* Module [Parsing]: the run-time library for parsers generated by [camlyacc]*)
exception Parse_error
(* Raised when a parser encounters a syntax error.
Can also be raised from the action part of a grammar rule,
to initiate error recovery. *)
type parser_env
val symbol_start : parser_env -> int
val symbol_end : parser_env -> int
(* [symbol_start] and [symbol_end] are to be called in the action part
of a grammar rule only. They return the position of the string that
matches the left-hand side of the rule: [symbol_start()] returns
the position of the first character; [symbol_end()] returns the
position of the last character, plus one. The first character
in a file is at position 0. *)
val rhs_start: parser_env -> int -> int
val rhs_end: parser_env -> int -> int
(* Same as [symbol_start] and [symbol_end], but return the
position of the string matching the [n]th item on the
right-hand side of the rule, where [n] is the integer parameter
to [lhs_start] and [lhs_end]. [n] is 1 for the leftmost item. *)
val clear_parser : parser_env -> unit
(* Empty the parser stack. Call it just after a parsing function
has returned, to remove all pointers from the parser stack
to structures that were built by semantic actions during parsing.
This is optional, but lowers the memory requirements of the
programs. *)
(*--*)
(* The following definitions are used by the generated parsers only.
They are not intended to be used by user programs. *)
type parse_tables =
{ actions : (parser_env -> Obj.t) array;
transl_const : int array;
transl_block : int array;
lhs : string;
len : string;
defred : string;
dgoto : string;
sindex : string;
rindex : string;
gindex : string;
tablesize : int;
table : string;
check : string;
error_function : string -> unit }
exception YYexit of Obj.t
val yyparse :
parse_tables -> int -> (Lexing.lexbuf -> 'a) -> Lexing.lexbuf -> 'b
val peek_val : parser_env -> int -> 'a
(* PATCH what is the use of this ???
val is_current_lookahead : 'a -> bool
*)
val parse_error : string -> unit
==> parsing.ml <==
(***********************************************************************)
(* *)
(* Objective Caml *)
(* *)
(* Xavier Leroy, projet Cristal, INRIA Rocquencourt *)
(* *)
(* Copyright 1996 Institut National de Recherche en Informatique et *)
(* Automatique. Distributed only by permission. *)
(* *)
(***********************************************************************)
(* $Id: parsing.ml,v 1.9 1997/02/25 14:41:16 xleroy Exp $ *)
(* The parsing engine *)
open Lexing
(* Internal interface to the parsing engine *)
type parser_env =
{ mutable s_stack : int array; (* States *)
mutable v_stack : Obj.t array; (* Semantic attributes *)
mutable symb_start_stack : int array; (* Start positions *)
mutable symb_end_stack : int array; (* End positions *)
mutable stacksize : int; (* Size of the stacks *)
mutable stackbase : int; (* Base sp for current parse *)
mutable curr_char : int; (* Last token read *)
mutable lval : Obj.t; (* Its semantic attribute *)
mutable symb_start : int; (* Start pos. of the current symbol*)
mutable symb_end : int; (* End pos. of the current symbol *)
mutable asp : int; (* The stack pointer for attributes *)
mutable rule_len : int; (* Number of rhs items in the rule *)
mutable rule_number : int; (* Rule number to reduce by *)
mutable sp : int; (* Saved sp for parse_engine *)
mutable state : int; (* Saved state for parse_engine *)
mutable errflag : int } (* Saved error flag for parse_engine *)
type parse_tables =
{ actions : (parser_env -> Obj.t) array;
transl_const : int array;
transl_block : int array;
lhs : string;
len : string;
defred : string;
dgoto : string;
sindex : string;
rindex : string;
gindex : string;
tablesize : int;
table : string;
check : string;
error_function : string -> unit }
exception YYexit of Obj.t
exception Parse_error
type parser_input =
Start
| Token_read
| Stacks_grown_1
| Stacks_grown_2
| Semantic_action_computed
| Error_detected
type parser_output =
Read_token
| Raise_parse_error
| Grow_stacks_1
| Grow_stacks_2
| Compute_semantic_action
| Call_error_function
external parse_engine :
parse_tables -> parser_env -> parser_input -> Obj.t -> parser_output
= "parse_engine"
let make_env () =
{ s_stack = Array.create 100 0;
v_stack = Array.create 100 (Obj.repr ());
symb_start_stack = Array.create 100 0;
symb_end_stack = Array.create 100 0;
stacksize = 100;
stackbase = 0;
curr_char = 0;
lval = Obj.repr ();
symb_start = 0;
symb_end = 0;
asp = 0;
rule_len = 0;
rule_number = 0;
sp = 0;
state = 0;
errflag = 0 }
let grow_stacks env =
let oldsize = env.stacksize in
let newsize = oldsize * 2 in
let new_s = Array.create newsize 0
and new_v = Array.create newsize (Obj.repr ())
and new_start = Array.create newsize 0
and new_end = Array.create newsize 0 in
Array.blit env.s_stack 0 new_s 0 oldsize;
env.s_stack <- new_s;
Array.blit env.v_stack 0 new_v 0 oldsize;
env.v_stack <- new_v;
Array.blit env.symb_start_stack 0 new_start 0 oldsize;
env.symb_start_stack <- new_start;
Array.blit env.symb_end_stack 0 new_end 0 oldsize;
env.symb_end_stack <- new_end;
env.stacksize <- newsize
let clear_parser env =
Array.fill env.v_stack 0 env.stacksize (Obj.repr ());
env.lval <- Obj.repr ()
(* PATCH what is the use of this ???
let current_lookahead_fun = ref (fun (x: Obj.t) -> false)
*)
let yyparse tables start lexer lexbuf =
let env = make_env () in
let rec loop cmd arg =
match parse_engine tables env cmd arg with
Read_token ->
let t = Obj.repr(lexer lexbuf) in
env.symb_start <- lexbuf.lex_abs_pos + lexbuf.lex_start_pos;
env.symb_end <- lexbuf.lex_abs_pos + lexbuf.lex_curr_pos;
loop Token_read t
| Raise_parse_error ->
raise Parse_error
| Compute_semantic_action ->
let (action, value) =
try
(Semantic_action_computed, tables.actions.(env.rule_number) env)
with Parse_error ->
(Error_detected, Obj.repr ()) in
loop action value
| Grow_stacks_1 ->
grow_stacks env; loop Stacks_grown_1 (Obj.repr ())
| Grow_stacks_2 ->
grow_stacks env; loop Stacks_grown_2 (Obj.repr ())
| Call_error_function ->
tables.error_function "syntax error";
loop Error_detected (Obj.repr ()) in
let init_asp = env.asp
and init_sp = env.sp
and init_state = env.state
and init_curr_char = env.curr_char in
env.stackbase <- env.sp + 1;
env.curr_char <- start;
try
loop Start (Obj.repr ())
with exn ->
let curr_char = env.curr_char in
env.asp <- init_asp;
env.sp <- init_sp;
env.state <- init_state;
env.curr_char <- init_curr_char;
match exn with
YYexit v ->
Obj.magic v
| _ ->
(* PATCH what is the use of this ???
current_lookahead_fun :=
(fun tok ->
if Obj.is_block tok
then tables.transl_block.(Obj.tag tok) = curr_char
else tables.transl_const.(Obj.magic tok) = curr_char);
*)
raise exn
let peek_val env n =
Obj.magic env.v_stack.(env.asp - n)
let symbol_start env =
if env.rule_len > 0
then env.symb_start_stack.(env.asp - env.rule_len + 1)
else env.symb_end_stack.(env.asp)
let symbol_end env =
env.symb_end_stack.(env.asp)
let rhs_start env n =
env.symb_start_stack.(env.asp - (env.rule_len - n))
let rhs_end env n =
env.symb_end_stack.(env.asp - (env.rule_len - n))
(* PATCH what is the use of this ???
let is_current_lookahead tok =
(!current_lookahead_fun)(Obj.repr tok)
*)
let parse_error (msg: string) = ()
==> s.ml <==
(* ============================================================================
* File: s.ml
* Language: caml
* Author: Thierry Bravier
* Time-stamp: <97/06/06 10:00:45 tb>
* Created: 97/06/06 10:00:38 tb
* ========================================================================= *)
open Printf
type filename = string
type syntax =
| Error of filename * int * int
| Import of filename * syntax list
| Assignment of string * int
let generic_parse_file parse lex filename =
let in_channel = open_in filename in
let result = parse lex (Lexing.from_channel in_channel) in
close_in in_channel;
result
exception Cyclic of filename
let rec check_acyclic filename = function
| [] -> ()
| (hd :: tl) ->
if filename = hd then raise (Cyclic filename)
else check_acyclic filename tl
let parse_imported_file parse (filenames, lex) filename =
check_acyclic filename filenames;
generic_parse_file
(parse (filename :: filenames, lex))
lex
filename
type 'token lexfun = Lexing.lexbuf -> 'token
let rec fprint_path out = function
| [] -> ()
| [a] -> fprintf out "%s" a
| (a :: b) -> fprintf out "%a.%s" fprint_path b a
let rec fprint_indent out = function
| [] -> ()
| (a :: b) -> fprintf out "\t%a" fprint_indent b
let rec fprint_syntax_path path out = function
| Error (filename, s, e) ->
fprintf out "-- SKIPPED ERROR at \"%s\":%d-%d" filename s e
| Import (filename, syntaxes) ->
let abs = filename :: path in
fprintf out "%a-- begin of %a\n" fprint_indent path fprint_path abs;
fprint_syntaxes_path abs out syntaxes;
fprintf out "%a-- end of %a" fprint_indent path fprint_path abs
| Assignment (name, int) ->
fprintf out "%a%a = %d;" fprint_indent path
fprint_path (name :: path) int
and fprint_syntaxes_path path out =
List.iter (fprintf out "%a\n" (fprint_syntax_path path))
and fprint_syntax out =
fprint_syntax_path [] out
and fprint_syntaxes out =
fprint_syntaxes_path [] out
(* ========================================================================= *)
==> p.mly <==
/* ============================================================================
* File: p.mly
* Language: camlyacc
* Author: Thierry Bravier
* Time-stamp: <97/06/06 10:03:41 tb>
* Created: 97/06/06 10:01:15 tb
* ========================================================================= */
%{
open S
let where_am_i parser_env = function
| ((filename :: _), _) ->
(filename, symbol_start parser_env + 1, symbol_end parser_env + 1)
| _ -> failwith "incorrect env"
let where_is_error parser_env env =
let ((filename, s, e) as result) = where_am_i parser_env env in
Printf.fprintf stderr
"File \"%s\", line %03d, characters %03d-%03d:\n" filename 0 s e;
result
%}
/* ========================================================================= */
%token <string> IDENT
%token <int> INT
%token IMPORT /* "import" */
%token EQUAL /* "=" */
%token SEMI /* ";" */
%token EOF /* eof */
%token ERROR /* others */
/* ========================================================================= */
%env <S.filename list * token S.lexfun> env
%start yacc_start_1
%type <S.syntax list> yacc_start_1
%start yacc_start_2
%type <string list> yacc_start_2
%%
/* ========================================================================= */
yacc_start_1 :
| phrase_1 yacc_start_1
{ $1 :: $2 }
| EOF
{ [] }
;
phrase_1 :
| error SEMI
{ let (filename, s, e) = where_is_error parser_env env in
Printf.fprintf stderr "ERROR\n";
Error (filename, s, e) }
| IMPORT IDENT SEMI
{ Import ($2, parse_imported_file yacc_start_1 env $2) }
| IDENT INT error SEMI
{ let (filename, s, e) = where_is_error parser_env env in
Printf.fprintf stderr "ERROR: YOU MEANT \"%s = %d;\"\n" $1 $2;
Error (filename, s, e) }
| IDENT EQUAL INT SEMI
{ Assignment ($1, $3) }
/* ------------------------------------------------------------------------- */
yacc_start_2 :
| IDENT SEMI yacc_start_2
{ $1 :: $3 }
| EOF
{ [] }
;
/* ========================================================================= */
%%
(* ========================================================================= *)
==> l.mll <==
(* ============================================================================
* File: l.mll
* Language: camllex
* Author: Thierry Bravier
* Time-stamp: <97/06/06 10:00:03 tb>
* Created: 97/06/06 09:59:45 tb
* ========================================================================= *)
{
open S
open P
open Printf
}
(* ========================================================================= *)
rule lexer = parse
[' ' '\012' '\013' '\009' '\026' '\010'] + { lexer lexbuf }
| "--" [^ '\n' ] * { lexer lexbuf }
| ['A'-'Z' 'a'-'z'] ('_' ? ['A'-'Z' 'a'-'z' '0'-'9']) *
{ match Lexing.lexeme lexbuf with "import" -> IMPORT | s -> IDENT s }
| ['1'-'9'] ['0'-'9']*
{ INT (int_of_string (Lexing.lexeme lexbuf)) }
| ";" { SEMI }
| "=" { EQUAL }
| eof { EOF }
| _ { ERROR }
(* ========================================================================= *)
{
let yacc_start_1_file = parse_imported_file yacc_start_1 ([], lexer)
let argl () = match Array.to_list (Sys.argv) with
| (prgm :: args) -> (prgm, args)
| _ -> failwith "argl"
let _ =
match argl () with
| prgm, [ filename ] ->
fprint_syntaxes stdout (yacc_start_1_file filename);
()
| prgm, _ ->
fprintf stderr "usage: %s file\n" prgm;
exit 1
}
(* ========================================================================= *)
==> bar_syntax <==
x = 7;
y 8;
import gee_syntax;
z = 9;
==> foo_syntax <==
a = 5;
b = 99;
import bar_syntax;
c = 100;
d = 200;
e = 300;
==> gee_syntax <==
t 6;
u = 93;
==> OUTPUT <==
./main foo_syntax
a = 5;
b = 99;
-- begin of bar_syntax
bar_syntax.x = 7;
-- SKIPPED ERROR at "bar_syntax":8-12
-- begin of bar_syntax.gee_syntax
-- SKIPPED ERROR at "gee_syntax":1-5
bar_syntax.gee_syntax.u = 93;
-- end of bar_syntax.gee_syntax
bar_syntax.z = 9;
-- end of bar_syntax
c = 100;
d = 200;
e = 300;
File "bar_syntax", line 000, characters 008-012:
ERROR: YOU MEANT "y = 8;"
File "gee_syntax", line 000, characters 001-005:
ERROR: YOU MEANT "t = 6;"
-- Thierry Bravier Dassault Aviation - DGT / DTN / ELO / EAV 78, Quai Marcel Dassault F-92214 Saint-Cloud Cedex - France Telephone : (33) 01 47 11 53 07 Telecopie : (33) 01 47 11 52 83 E-Mail : mailto:thierry.bravier@dassault-aviation.fr
This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:11 MET