Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] line number information in abstract syntax trees
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2003-09-16 (20:08)
From: skaller <skaller@o...>
Subject: Re: [Caml-list] line number information in abstract syntax trees
On Mon, 2003-09-15 at 17:53, Rafael 'Dido' Sevilla wrote:
> As some of you have suggested earlier, I have foregone doing some
> preliminary semantic analysis for my compiler in my ocamlyacc grammar,
> and instead am using the grammar solely to do syntactic analysis.  Which
> then brings me to another problem.  I've created an abstract syntax tree
> data type, but now I need to somehow embed line number information
> obtained from the syntactic analysis phase so that I can later do error
> reporting.  I can't think of a clean way to do this.  So far, I have a
> syntax tree data type that kind of looks like:
> type program = { impmodule: string ; tdecls: topdecl list; plineno:int}
> and topdecl =
>     Declaration of decl * int
> and decl = { idents: string list ; dtype: xtype ; dlineno:int}
> and xtype =
>     Data of datatype * int
>   | Func of fntype * int
>   | Alias of xtype * int
> and datatype =
>     Byte of int
>   | Int of int
>   | Big of int
>   | Real of int
>   | String of int
>   | Tuple of (datatype list * int)
>   | Array of (datatype * int)
>   | List of (datatype * int)
>   | Chan of (datatype * int)
> Note that all the record types have additional fields that look like
> 'plineno:int' and every variant type has an int tacked on somewhere.
> That int is supposed to contain the line number.
> This works just fine, but it just seems to me like such a grossly ugly
> hack into what is otherwise an elegant-looking data structure.  Anyone
> have style guidelines 

In Felix, every single node of the Abstract Syntax Tree contains
a source reference (except type expressions). Whilst it is
painful to construct this information, it is worthwhile.
My nodes look like:

	| AST_name (sr,name)
	| AST_apply(sr,f1,e1)
.	| AST_literal (sr,9999)

where sr is the source reference.

An alternative for expressions is a dummy expression 

	| AST_srcref (sr,e)

which can be put where needed. When the source
is just a 'span' of two nodes it can be elided,
and you use a function

	let rec src x = match x with
	| AST_apply(f,e) -> range (src f) (src e)

to compute the source location. My ranged source
references have the type

	type range_srcref  = 
		string * (* filename *)
		int * (* start line number *)
		int * (* start column *)
		int * (* end line number *)
		int  (* end column *)

Even token lexed contains the filename,
line number, and start and end columns
of the lexeme the token was derived from.

The pain of carrying the source references
around is lessened when you consider that 
in any production quality compiler 70%
of all the code is error reporting anyhow :-)

Here's an error diagnostic (this one is actually
a compiler bug)

[bind_exe] LHS[t](List::list[<T1128>]) of initialisation must have same
type as
RHS(List::list[<T1104>]) unfolded LHS = List::list[<T1128>]
In lpsrc/flx_lib.ipk: line 504, cols 19 to 20
503:       | Empty => Empty
504:       | Cons (?h, ?t) => Cons (h, rev t)
505:       endmatch

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: