Version française
Home     About     Download     Resources     Contact us    
Browse thread
Text inclusion
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Anton Moscal <msk@p...>
Subject: Text inclusion
Hello,

I made an attempt to implement by camlp4 some form of the file inclusion. Everything works, 
but I've encountered the following problem: because AST contains only position information
(and not the file name), ocaml produces incorrect info about error location.

After the discussion with Daniel de Rauglaudre, it turned out that supporting this info
required small changes in ocaml & camlp4 (I can send patches, with which
everything works well on my computer). Daniel says the following:

> Well, if Xavier implements your system of location, I make the
> associated change in Camlp4 (even if there are numerous changes). The
> problem is to convince him.

I've prepared a text with some argumentation in favour of the usefulness of  
certain kinds of text inclusion: 

++++++++++++++++++++++++++++++++++++++++++++
First of all: an example of changes in syntax, which uses the construction given below.

This allows to write in program files x.ml instead of the expression (or module expression)
construction `nest "y"', and in the file x.y.ml - `val (or module) egg "y" = <expression>' 
(or <module expression>). 

------------------------------------------
let do_egg loc sfx egg_rule =
  let name = (Filename.chop_suffix !input_file ".ml") ^ "." ^ sfx ^ ".ml"in
  let chan = 
    try open_in name with
      exn -> raise_with_loc loc exn
  in 
  let old_name = !input_file in
  input_file := name;
  let ((sfx', loc_sfx'), res) = 
    try
      Grammar.Entry.parse egg_rule (Stream.of_channel chan) 
    with
      exn -> close_in chan; raise exn
  in
  close_in chan;
  if sfx' <> sfx then 
    raise_with_loc loc_sfx' (Stream.Error ("name of egg must be equal nest name: \""^sfx^"\""));
  input_file := old_name;
  (name, res)

EXTEND
  GLOBAL: expr module_expr;

  string_with_loc: [[ str = STRING -> (str, loc) ]];

  expr_egg:
    [[ "val"; LIDENT "egg"; sfx = string_with_loc; "="; body = expr -> (sfx, body)]];

  expr: LEVEL "simple" [[ "nest"; sfx = STRING -> 
    let (name, res) = do_egg loc sfx expr_egg in MLast.ExNst (loc, name, res)
			]];

  module_expr_egg:
    [[ "module"; LIDENT "egg"; sfx = string_with_loc; "="; body = module_expr -> (sfx, body)]];

  module_expr: [[ "nest"; sfx = STRING -> 
    let (name, res) = do_egg loc sfx module_expr_egg in MLast.MeNst (loc, name, res)
			]];
END

---------------------------------------

I used this "nest-egg" construction in my sources of some syntax analyzer, and got 
the following directory structure (I've removed from this list all files, which 
remain in the `old style'):

+++++++++++++++++++++++++++++++++++
-rw-r--r--   1 msk      msk          2759 Nov 20 15:36 analyzer.ml
-rw-rw-r--   1 msk      msk           293 Nov 20 15:33 analyzer.balance.ml
-rw-rw-r--   1 msk      msk           786 Nov 20 15:35 analyzer.block.ml
-rw-rw-r--   1 msk      msk          1461 Nov 20 15:33 analyzer.expression.ml
-rw-rw-r--   1 msk      msk          1842 Nov 20 15:34 analyzer.statement.ml

-rw-r--r--   1 msk      msk          3036 Nov 20 15:50 declaration.ml
-rw-rw-r--   1 msk      msk          2037 Nov 20 15:38 declaration.base_type.ml
-rw-rw-r--   1 msk      msk           324 Nov 20 15:45 declaration.type_specifier.ml
-rw-rw-r--   1 msk      msk          2527 Nov 20 15:48 declaration.type_specifier.builtin.ml
-rw-rw-r--   1 msk      msk          2136 Nov 20 15:40 declaration.type_specifier.enum.ml
-rw-rw-r--   1 msk      msk           220 Nov 20 15:41 declaration.type_specifier.ident.ml

-rw-r--r--   1 msk      msk          2042 Nov 20 16:01 tokens.ml
-rw-rw-r--   1 msk      msk          2120 Nov 20 15:58 tokens.get.ml
-rw-rw-r--   1 msk      msk           316 Nov 20 16:00 tokens.next.ml
-rw-rw-r--   1 msk      msk           217 Nov 20 15:59 tokens.req.ml
-rw-rw-r--   1 msk      msk           193 Nov 20 16:00 tokens.test.ml

-rw-r--r--   1 msk      msk          4020 Nov 20 15:57 types.ml
-rw-rw-r--   1 msk      msk          1939 Nov 20 15:56 types.builtin.ml
-rw-rw-r--   1 msk      msk          1558 Nov 20 15:57 types.to_string.ml
-----------------------------------

I hope, this directory structure has a self-evident sense.

And (for example) two short files from this list:

+++++++++++++++++++++++++++++++++++
(* declaration.type_specifier.ml *)
val egg "type_specifier" = 
  let default_type = Builtin.signed_int in
  let (t, storage_class') = 
    match scanner#curr with
    | Lex.Enum       -> nest "enum"
    | Lex.Ident name -> nest "ident"
    | lex            -> nest "builtin"
  in (add_qualifiers qualifiers t, obsolete_storage_class_specifier storage_class')
-----------------------------------

+++++++++++++++++++++++++++++++++++
(* declaration.type_specifier.ident.ml *)
val egg "ident" = 
  (begin
    try
      match id_tab#find (Scope.Ident, name) with
	{id_value = Type t} -> scanner#get; t
      |	_ -> default_type
    with
      Ident.Not_found -> default_type
  end, storage_class)
-----------------------------------

The first of these files is much more readable than its original
version (sizes of `enum' and `builtin' eggs being about 100 lines 
each, while total original size of the `type_specifier' function was 
about 230 lines).

I.e. I think that usage of the 'nest-egg' construction greatly improves
readability of a program and eliminates the main trouble with 
deeply nested declarations structures - the huge size of single source 
file, which makes it unmanageable.

Also, an important property of this style is the easiness of extracting
a block into a separate file: we can simply create a new file, containing
the text of this block, and write instead of it the corresponding `nest' statement.
No additional variables, function parameters or anything else are needed.


Historically, this syntax was derived from the proposal concerning extending Algol-68 with 
modules and separate compilation, which was in some issue of "Algol-bulletin"
(I can't give the exact reference now, but if you are interested, I'll try
to find it).

In this proposal, the nest-egg wasn't a form of the `include' directive, but rather a
tool for separate compilation: on a `nest' construction the compiler generates
tables with symbolic information for the `nest', and during the compilation
the corresponding egg compiler uses this information. Also, in A-68 
many 'eggs' may correspond to one 'nest', one of them having to be an expression (`unit' in
the A-68 terminology) and all others - modules' definitions, which are compiled
in the same environment as the expression egg and can access each others (and can be
accessed by the expression egg of the same name).

This is the best solution of the problem of nested modules, known to me (in 
all other systems local modules has little use, because their active usage 
leads to huge top-level modules).

Obviously, the text inclusion mechanism may also prove useful in many other situations. 

-----------------------------

Regards, 
Anton