Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] [ANN] The Missing Library
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2004-05-03 (00:02)
From: Marcin 'Qrczak' Kowalczyk <qrczak@k...>
Subject: Re: [Caml-list] [ANN] The Missing Library
W li¶cie z czw, 29-04-2004, godz. 03:31 +1000, skaller napisa³:

> I have quite a lot of 'multi-pass' phases in my Felix compiler
> where i build temporary data structures in between.
> I'd love to glue some of these together to eliminate building
> whole data structures.

I'm not sure it's worth the trouble. There are two issues of compiler
phases: organization and efficiency.

Concerning the organization, my opinion is that we do want to separate
it into several small phases, so each pass becomes easier to manage.
Trying to do too much at once sometimes requires to recompute the same
thing several times, because there is no convenient intermediate form
to store it for reuse, and it's especially bad if source and target
languages require very different code structure - then it's very hard
to get right at all.

And for the efficiency, today's computers are usually capable of fitting
two successive phases of the whole module in memory. GCC recently
switched to optimizing a module at a time rather than a function at a
time as previously. Also, lazy structures with some smart inversion of
control flow typically introduce an overhead of creating many closures
and calling many functions through pointers.

>  For example I have phases:
> 	tokeniser -> rewrite token stream -> parser ->
> 	macro expansion and constant folding ->
> 	desugaring (syntactic term rewriting) ->
> 	lifting (separate declarations from executable code)

In my compiler of my language (implemented in itself) I have the
following phases. They came out of necessity - I wasn't able to manage
more complex transformations at once.

Code representation                | Compiler phase
Source string                      |
                                   | Lexer
List of tokens (strict)            |
                                   | Parser
Source syntax tree                 |
                                   | Expander which translates syntactic
                                   | sugar and resolves names; macros
                                   | will be here when implemented;
                                   | an interface file is written and
                                   | interfaces of used modules are read
More abstract tree, capturing the  |
essence of the semantics; smart    |
constructors compute the set of    |
free local variables of function   |
nodes                              |
                                   | Various optimizations will be here:
                                   | inlining, lambda-lifting of some
                                   | functions, type analysis (the
                                   | language is dynamically typed);
                                   | this phase is not implemented yet
The same representation of abstract|
tree as before                     |
                                   | Planner which lifts functions and 
                                   | constant data to the toplevel
                                   | and determines the sequence of
                                   | operations to perform inside each
                                   | function
A set of global definitions,       |
including function bodies expressed|
as almost flat sequences of        |
statements (modulo conditionals)   |
                                   | C coder which splits functions into
                                   | fragments between calls (needed
                                   | because of tail calls), allocates
                                   | virtual registers, stack frame
                                   | slots & temporary C variables,
                                   | determines where the stack frame
                                   | is active and generates C code
C abstract syntax tree             |
                                   | Pretty printer for a subset of
                                   | C syntax
The C output string                |

Phases are executed strictly in order, with no overlap. Each
representation of code uses different types with different structure,
except that information about a name (11 fields) is shared between
the "abstract" and "sequential" representation, and various atomic
data are shared between many phases (e.g. literal values, source

The slowest part is currently the pretty printer.

   __("<         Marcin Kowalczyk

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: