Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] [ANN] The Missing Library
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2004-05-03 (08:58)
From: Marcin 'Qrczak' Kowalczyk <qrczak@k...>
Subject: Re: [Caml-list] [ANN] The Missing Library
Hoping that they will not kill us for off-topic...

> > Concerning the organization, my opinion is that we do want to separate
> > it into several small phases, so each pass becomes easier to manage.
> How small though?

Of course extremes are bad; not *too* small.

Actions which are mutually recursive should be done in one phase, unless
the language requires very deep phase loops.

(The primary example is C++ where evaluation of constant expressions
can influence parsing of further parts, with name resolution and type
checking and template instantiation sitting between so they are also all
mutually recursive. I have no idea how to sanely organize a C++

> > Trying to do too much at once sometimes requires to recompute the same
> > thing several times, because there is no convenient intermediate form
> > to store it for reuse,
> Oh, this is a perfect description of my horrible lookup algorithm!
> Originally, it was decoupled, but it didn't work.
> I've had to glue the whole thing together in a single operation
> because everything is so mutually recursive.

Hmm. This probably depends on the language. Mine doesn't have anything
which requires far "phase lookahead". It's dynamically typed and there
is no compile-time overloading, only dynamic dispatch.

The only troublesome part wrt. the order was possible recursion among
definitions in a block. All definitions are potentially mutually
recursive, and using a name before its definition has been executed, in
any other way than attaching it to a closure, is an error, by necessity
sometimes detected only at runtime.

I needed to know which names will be introduced before processing
the contents of the definitions. This was done in the same phase as
understanding arguments of "builtin macros", which were all parsed like
normal applications. So I needed to recognize macro names and analyze
their arguments twice: first to predict which names will be introduced,
then to compile the definitions.

The duplication was getting ugly. I considered moving the recognition
of builtin macros to a separate phase, done before name resolution,
but this would make user-defined macros impossible in future, because
recognition of what is a macro should be done after name resolution.

So my solution was to analyze each sequence of definitions in two
mini-phases, with an explicit representation between them. The first
mini-phase checks the syntax of arguments of builtin macros, creates
objects representing properties of introduced names, updates the
environment which maps source names to these name objects, and outputs
the sequence of definitions represented in different types. Builtin
macros have separate kinds of nodes and bound names already use name
objects, but all subexpressions are still unanalyzed, in their source
form. The second mini-phase doesn't change the environment and can
finally descend into expressions.

> Interested in how you handle tail calls in C.

The portable variant uses the well-known trampoline style, where each C
function returns a pointer to the next function to jump into. The stack
is managed explicitly. But for x86 I process the assembler output of gcc
and convert code which returns an address (marked with a comment in asm)
with a jump. This increased performance by about 30%.

If the compiler is ever used on other architectures, someone who knows
other assemblers might extend the mangler to handle them. I only know
the x86 assembly.

This idea was not mine, GHC does a similar thing. But its mangler does
many horrible things, like putting some data structures adjacent to code
so they can be found at a known negative offset from the code pointer,
and removing prologues and epilogues. Mine does less things, it only
converts ret (and possibly loading of a return value) to a jump.

It does require some work, for example gcc before version 3.4.0 liked
to merge code of multiple exit points from a function. Often merged were
code paths after loading different results. So I have to duplicate these
code paths, so that each branch can have its own return value at the end
(turning "movl $constant_label,%eax" and "ret" into "jmp constant_label"
is particularly good for the CPU, but sometimes the same "ret"
corresponds to multiple labels from merged exit paths). It happens that
gcc-3.4.0 doesn't merge so much.

The mangler is 94 lines of Perl (not counting comments and empty lines).

   __("<         Marcin Kowalczyk

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: