Browse thread
[Caml-list] [ANN] The Missing Library
-
John Goerzen
-
Kenneth Knowles
- Alexander V. Voinov
-
John Goerzen
-
Maxence Guesdon
-
John Goerzen
- Maxence Guesdon
-
John Goerzen
-
Alain.Frisch@e...
-
John Goerzen
-
Alain.Frisch@e...
-
Nicolas Cannasse
-
Yamagata Yoriyuki
- Gerd Stolpmann
-
Nicolas Cannasse
-
Yamagata Yoriyuki
- Jacques GARRIGUE
- Nicolas Cannasse
-
Yamagata Yoriyuki
-
Yamagata Yoriyuki
-
Nicolas Cannasse
- oliver@f...
-
Alain.Frisch@e...
-
John Goerzen
- Henri DF
- Shawn Wagner
- james woodyatt
-
Alain.Frisch@e...
- Basile STARYNKEVITCH
-
John Goerzen
- Kenneth Knowles
- Florian Hars
-
Maxence Guesdon
- Eric C. Cooper
-
Kenneth Knowles
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2004-05-03 (00:02) |
From: | Marcin 'Qrczak' Kowalczyk <qrczak@k...> |
Subject: | Re: [Caml-list] [ANN] The Missing Library |
W li¶cie z czw, 29-04-2004, godz. 03:31 +1000, skaller napisa³: > I have quite a lot of 'multi-pass' phases in my Felix compiler > where i build temporary data structures in between. > > I'd love to glue some of these together to eliminate building > whole data structures. I'm not sure it's worth the trouble. There are two issues of compiler phases: organization and efficiency. Concerning the organization, my opinion is that we do want to separate it into several small phases, so each pass becomes easier to manage. Trying to do too much at once sometimes requires to recompute the same thing several times, because there is no convenient intermediate form to store it for reuse, and it's especially bad if source and target languages require very different code structure - then it's very hard to get right at all. And for the efficiency, today's computers are usually capable of fitting two successive phases of the whole module in memory. GCC recently switched to optimizing a module at a time rather than a function at a time as previously. Also, lazy structures with some smart inversion of control flow typically introduce an overhead of creating many closures and calling many functions through pointers. > For example I have phases: > > tokeniser -> rewrite token stream -> parser -> > macro expansion and constant folding -> > desugaring (syntactic term rewriting) -> > lifting (separate declarations from executable code) In my compiler of my language (implemented in itself) I have the following phases. They came out of necessity - I wasn't able to manage more complex transformations at once. Code representation | Compiler phase -----------------------------------+----------------------------------- Source string | | Lexer List of tokens (strict) | | Parser Source syntax tree | | Expander which translates syntactic | sugar and resolves names; macros | will be here when implemented; | an interface file is written and | interfaces of used modules are read More abstract tree, capturing the | essence of the semantics; smart | constructors compute the set of | free local variables of function | nodes | | Various optimizations will be here: | inlining, lambda-lifting of some | functions, type analysis (the | language is dynamically typed); | this phase is not implemented yet The same representation of abstract| tree as before | | Planner which lifts functions and | constant data to the toplevel | and determines the sequence of | operations to perform inside each | function A set of global definitions, | including function bodies expressed| as almost flat sequences of | statements (modulo conditionals) | | C coder which splits functions into | fragments between calls (needed | because of tail calls), allocates | virtual registers, stack frame | slots & temporary C variables, | determines where the stack frame | is active and generates C code C abstract syntax tree | | Pretty printer for a subset of | C syntax The C output string | Phases are executed strictly in order, with no overlap. Each representation of code uses different types with different structure, except that information about a name (11 fields) is shared between the "abstract" and "sequential" representation, and various atomic data are shared between many phases (e.g. literal values, source locations). The slowest part is currently the pretty printer. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners