Introduction
Structure of the program
The first local optimizer
The second local optimizer
The global synthesizer
Integrating the three optimizers
Developing a test suite
Experimental results
Conclusions and random thoughts
Further reading
The challenge of the 1999 ICFP programming contest was to develop an optimizer for a small programming language used to describe non-player characters in interactive fiction games. The goal of the optimizer was to reduce as much as possible the size of the programs describing non-player characters, while preserving their semantics. The challenge task was posted on the Web on Thursday Sept 2nd at 11pm French time; participants were given 72 hours till Sunday 5th 11pm to complete the challenge, using the language of their choice.
Our entry, code-named ``Composite'', was written by a team of five persons:
Pascal Cuoq | PhD student, INRIA Rocquencourt |
Damien Doligez | Researcher, INRIA Rocquencourt |
Xavier Leroy | Researcher, INRIA Rocquencourt and Trusted Logic S.A. |
Luc Maranget | Researcher, INRIA Rocquencourt |
Alan Schmitt | PhD student, INRIA Rocquencourt |
The program is written entirely in Objective Caml version 2.02+4
(the current development version at the time of the contest), using
the Unix
library for accessing OS services.
As its name implies, ``Composite'' is the outcome of no less than four independent attempts at solving the challenge. Two of those attempts are local optimizers, operating by local size-reducing transformations on the input programs. The other two are global synthesizers that attempt to synthesize from the grounds up a program equivalent to the input program, but smaller.
The four optimizers were developed largely independently on Friday, Saturday and Sunday morning. The sub-teams shared basic pieces of code such as parsers and pretty-printers, as well as test suites and equivalence testers. On Sunday afternoon, one of the global synthesizers was very far from working and was abandoned, leaving three working optimizers that each had their own strengths and weaknesses. No optimizer was consistently better than the others on our tests, so we decided to combine them in a single entry, and spent a busy Sunday evening stitching them together.
The first local optimizer was developed by Pascal Cuoq and Alan Schmitt. The lexer, parser and pretty printer were completed within 90 man.minutes. The rest was a set of independent optimizations, each quickly written as soon as the powerful drugs from coffee and chocolate induced their images in our feverish minds.
This optimizer performs the following transformations on the input program:
CASE
and
IF
statements.
This optimization does the same as above, with no precaution because
it can not increase the cost.
CASE
statements involving the same variable.
This optimization works on two CASE
statements that
concern the same variable, and that are the statements of two rules for two
different state lists. It unfolds each ARM
list
such that the value set consists of only one value, and sort the two
obtained lists so that it can merge them, adding an IF
statement on the states if the statements are different for the same
ARM
value. Identical statements for different ARM
values are
then merged, and the result is evaluated to make sure the cost saved
is greater than the possible additional span. For instance,
((1 2 (CASE (VAR "y") ((ARM (100) (DECISION _ "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (DECISION _ "c"))) (DECISION _ "d"))) (3 4 (CASE (VAR "y") ((ARM (100) (DECISION _ "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (DECISION _ "e"))) (DECISION _ "d")))))becomes
((1 2 3 4 (CASE (VAR "y") ((ARM (100) (DECISION _ "a")) (ARM (101) (DECISION _ "b")) (ARM (102) ( (IF (OR (EQUALS (VAR "state") 1) (EQUALS (VAR "state") 2)) (DECISION _ "c") () (DECISION _ "e"))))) (DECISION _ "d"))) )
DECISION n
by DECISION _
when the current state is known to be n
.
CASE
statements in IF
when appropriate, and
removing of degenerate cases (such as those with only a default
statement). This optimization is fairly straightforward but compares
the cost of the resulting code to make sure it is better.
This optimizer was completed on Friday evening and submitted as a ``lightning entry'' under the team name of ``Bat 8''. It ranked 6th on the preliminary ranking, and 3rd of the lightning entries, right behind the entries which were to receive the second prize and jury's prize.
The second local optimizer was developed independently by Xavier Leroy. It performs the following transformations:
CASE
and IF
statements. Sparse CASE
statements are split into
cascades of dense CASE
statements, e.g.
(CASE (VAR "x") ==> (CASE (VAR "x") ((ARM (1) a1) ((ARM (1) a1) (ARM (2) a2) (ARM (2) a2) (ARM (3) a3) (ARM (3) a3) (ARM (4 100) a4) (ARM (4) a4)) (ARM (101) a5) (CASE (VAR "x") (ARM (102) a6) ((ARM (100) a4) (ARM (103) a7)) (ARM (101) a5) a8) (ARM (102) a6) (ARM (103) a7)) a8)
(As shown in the example, this transformation can duplicate some arms
of the CASE
statement, and thus must be done with care.
A dynamic programming algorithm is used to select the optimal cut points.)
Small CASE
statements are transformed into cascades of
IF
s when beneficial:
(CASE (VAR "x") ==> (IF (EQUALS (VAR "x") 1) ((ARM (1) a1) a1 (ARM (100) a2)) ((ELSEIF (EQUALS (VAR "x") 100) a2)) a3) a3)Large cascades of
IF
tests over the same variable are turned into CASE
statements:
(IF (EQUALS (VAR "x") 1) a1 ==> (CASE (VAR "x") ((ELSEIF (EQUALS (VAR "x") 2) a2) ((ARM (1) a1) (ELSEIF (EQUALS (VAR "x") 4) a4) (ARM (2) a2) (ELSEIF (EQUALS (VAR "x") 5) a5)) (ARM (4) a4) a6) (ARM (5) a5)) a6)
CASE-CASE
transformation: when the arms of a
CASE
are CASE
statements on the same
variable, attempt to permute the two levels of CASE
:
(CASE (VAR "x") ==> (CASE (VAR "y") ((ARM (1) ((ARM (100) (DECISION _ "a")) (CASE (VAR "y") (ARM (101) (DECISION _ "b")) ((ARM (100) (DECISION _ "a")) (ARM (102) (CASE (VAR "x") (ARM (101) (DECISION _ "b")) ((ARM (1) (DECISION _ "c")) (ARM (102) (DECISION _ "c"))) (ARM (2) (DECISION _ "e"))) (DECISION _ "d"))) (DECISION _ "f")))) (ARM (2) (DECISION _ "d")) (CASE (VAR "y") ((ARM (100) (DECISION _ "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (DECISION _ "e"))) (DECISION _ "d")))) (CASE (VAR "y") ((ARM (100) (DECISION _ "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (DECISION _ "f"))) (DECISION _ "d"))))
CASE
and
IF
statements. For the purpose of identifying arms,
the statements (DECISION nnn "utterance")
and
(DECISION _ "utterance")
can be converted into
one another when applicable, using a non-standard form of unification.
For instance:
((0 (CASE (VAR "x") ==> ((0 (CASE (VAR "x") ((ARM (1) (DECISION _ "foo")) ((ARM (1 2) (DECISION _ "foo")) (ARM (2) (DECISION 0 "foo"))) (DECISION _ "bar")))) (DECISION _ "bar"))))
IF
statements. For instance, redundant tests in AND
and
OR
are eliminated:
(IF (OR (EQUALS (VAR "x" 1)) ==> (IF (EQUALS (VAR "x" 1)) (AND (EQUALS (VAR "x" 1)) ...) (EQUALS (VAR "y" 2)))) ...)Similarly, we apply the distributivity laws for
OR
and AND
to factor out common subexpressions in IF
conditions:
(IF (OR (AND (EQUALS (VAR "x" 1)) ==> (IF (AND (EQUALS (VAR "x" 1)) (EQUALS (VAR "y" 2))) (OR (EQUALS (VAR "y" 2) (AND (EQUALS (VAR "x" 1)) (EQUALS (VAR "z" 3)))) (EQUALS (VAR "z" 3))) ...) ...
Propagation of the information gained from tests already performed
into the arms of IF
and CASE
statements.
Consider for instance (IF (EQUALS (VAR "x") 1) a1 () a2)
.
In a1
, we know that x
is equal to 1;
in a2
, we know that x
is different from 1.
Thus, all (EQUALS (VAR "x") NNN)
and (CASE (VAR
"x"))
tests in a1
can be statically resolved. In
a2
, (EQUALS (VAR "x") 1)
tests can also be
statically resolved, and CASE
statements over
x
can be simplified by removing the arm corresponding to
x
= 1.
CASE
statement on the
state
variable:
((1 (CASE (VAR "y") ((ARM (100) (DECISION _ "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (DECISION _ "c"))) (DECISION _ "d"))) (2 (CASE (VAR "y") ((ARM (100) (DECISION _ "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (DECISION _ "e"))) (DECISION _ "d")))))thus becomes
((1 2 (CASE (VAR "state") ((ARM (1) (CASE (VAR "y") ((ARM (100) (DECISION _ "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (DECISION _ "c"))) (DECISION _ "d")))) (CASE (VAR "y") ((ARM (100) (DECISION _ "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (DECISION _ "e"))) (DECISION _ "d")))))In itself, this transformation always increase the size of the character, but it may enable a worthwhile
CASE-CASE
transformation later. In the example above, after swapping the
CASE
statements over state
and y
we finally get:
((1 2 (CASE (VAR "y") ((ARM (100) (DECISION _ "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (IF (EQUALS (VAR "state") 1) (DECISION _ "c") () (DECISION _ "e")))) (DECISION _ "d"))))
The global synthesizer was written by Damien Doligez. It works by computing a denotational semantics (so to speak) for the input program, then rebuilding an equivalent program by a divide-and-conquer algorithm with backtracking.
More precisely, the program is first transformed into a list of (boolean expression, decision) pairs, where the boolean expression on the variables is true if and only if the input program terminates on the given decision. For instance, the input program
((1 (CASE (VAR "y") ((ARM (100) (DECISION 2 "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (DECISION _ "c"))) (DECISION _ "d"))) (2 (CASE (VAR "y") ((ARM (100) (DECISION _ "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (DECISION _ "e"))) (DECISION 2 "d")))))is transformed into
((state = 1 /\ y = 100) \/ (state = 2 /\ y = 100), DECISION 2 "a") ((state = 1 /\ y = 101) \/ (state = 2 /\ y = 101), DECISION _ "b") (state = 1 /\ y = 102, DECISION _ "c") ((state = 1 /\ y != 100 /\ y != 101 /\ y != 102) \/ (state = 2 /\ y != 100 /\ y != 101 /\ y != 102), DECISION _ "d") (state = 2 /\ y = 102, DECISION _ "e")Note the change from
DECISION _ "a"
to
DECISION 2 "a"
, and also the change from
DECISION 2 "d"
to DECISION _ "d"
. These are
made to minimise the number of different DECISION
statements.
The boolean expressions are then simplified with rules similar to
the ones used by the second local optimizer for IF
statements (elimination of redundant tests and factoring of common
subexpressions using distributivity).
Then begins the synthesis phase. The program picks an
"interesting" variable and a partition of all the values of interest
for this variable in the current context. It generates a
CASE
statement on this variable and the lists of pairs
(condition, action) corresponding to each branch of the
CASE
. Each condition is the conjunction of the original
condition and the assertion that the variable is in the subset for
this case. The conditions are simplified and unsatisfiable clauses
are eliminated.
An example is in order. Assume we pick the variable
y
and the partition {{100}, {101}, {102}, N \ {100,101,102}}
in our previous example. We get the following program skeleton:
(CASE (VAR "y") ((ARM (100) hole_1) (ARM (101) hole_2) (ARM (102) hole_3)) hole_4)
The four holes are filled by recursively calling the synthesis
function on the four lists of pairs obtained by conjoining the
conditions of the original list
respectively with y = 100
, y = 101
,
y = 102
, and y != 100 /\ y != 101 /\ y != 102
.
Those lists, after simplification, are:
hole_1: (state = 1 \/ state = 2, DECISION 2 "a") hole_2: (state = 1 \/ state = 2, DECISION _ "b") hole_3: (state = 1, DECISION _ "c") (state = 2, DECISION _ "e") hole_4: (state = 1 \/ state = 2, DECISION _ "d")
For hole_1, hole_2, hole_4, we are in the ground case: only one clause
remains so there is no need to test anything and the resulting program
is the corresponding DECISION
statement.
For hole_3, the recursive call will produce a CASE
statement on state
, which is transformed on the fly into
an IF
statement using code borrowed from the second local
optimizer.
The resulting program is:
(CASE (VAR "y") ((ARM (100) (DECISION 2 "a")) (ARM (101) (DECISION _ "b")) (ARM (102) (IF (EQUALS (VAR "state") 1) (DECISION _ "c") (DECISION _ "e"))) (DECISION _ "d"))
Now we need to explain how the variable and partition are chosen. This is done by three heuristics controlled by a fourth heuristic and a timer. In addition, we use backtracking: at each step, several possible choices are tried and the best one (in terms of result size) is retained.
The first heuristic is very simple: try each variable with the maximal partition (i.e. the smallest partition that completely separates the cases, ensuring this variable will not need to be tested again).
The second heuristic is a bit smarter: for each variable, it tries to find a partition that minimizes the number of duplicated clauses. To this end, it regroups clauses that share many possible values of the variable.
Note that the first two heuristics can fail to give any non-trivial
partition. The third one acts as a safety net because it always gives
some result. It picks one of the clauses (trying to choose the one
with the smallest condition) and generates an IF
statement (instead of a CASE
as above), with the
clause's DECISION
in the THEN branch, and the program
corresponding to the other clauses in the ELSE branch.
The controlling heuristic computes an approximate cost for each partition generated by the first two heuristics and retains a number of the best ones (3 or 1, depending on the state of the timer) to try in addition to the skeleton generated by the third heuristic.
The two local optimizers are rather fast and well-behaved. (Even after iterating them a few times for good measure, they only take a few seconds on typical inputs.) Thus, we decided to run them first.
The global synthesizer is exponential by nature (because several alternatives are tried at each point of the divide-and-conquer strategy), so it's an ideal candidate for eating up whatever remains of the 30 minutes of running time allocated to the programs. We thus run the global synthesizer after the local optimizers, using the smallest of the outputs of the latters as input to the global synthesizer.
In an attempt to help the global synthesizer produce some output in a reasonable amount of time, we start a 10-minute timer at the beginning of the synthesis phase. When this timer expires, the branching factor of the divide-and-conquer algorithm is reduced from 3 to 1 (that is, the algorithm no longer considers the 3 best candidate tests at each step, but picks only the best candidate).
The output of the global synthesizer can often benefit from peephole optimizations, so we re-run the local optimizers on the output of the synthesizer for a final cleanup.
According to the contest rules, size (of the output programs) matters a lot. But implicit in the rules is that correctness and timeliness matters even more. We guessed correctly that contestants that produce wrong output, or fail to produce any output at all within the time limits, would be eliminated immediately. We also knew for a fact that our program will be buggy, given the short development time and small test suite that we had. Thus, we made every possible effort to ensure that our program would always produce an output semantically equivalent to the input program within the time limits. This was achieved by the following standard defensive programming techniques:
At all times, a global variable holds the smallest result obtained so far. This variable is initialized with a copy of the input program. A timer is initialized at the very beginning of the execution to one minute less than the time allocated to this run (29 minutes by default). Whenever the timer expires, the result contained in the global variable is output. This makes sure that we always produce an output within the time limits; in the worst case where none of the optimizers terminates, this output can be identical to the input program, but that's still much better than no output at all.
When one of the optimizers produces a result that is smaller than the current best result, the new result is first tested for equivalence with the input program before being recorded. The equivalence test proceeds by generating random values for the variables, then evaluating the input program and the new result on this output and checking that they produce the same decision. This process is repeated for a duration of 30 seconds. We also had an exhaustive equivalence testing procedure, written by Luc Maranget as a kind of simulation test between two programs, but decided not to use it by fear that it would take too long to complete on large inputs. For the same reason, we chose to bound the duration of the randomized test rather than its number of iterations. Even though the randomized test is incomplete by nature, we found it extremely effective in catching bad outputs.
All exceptions escaping from the local optimizers are caught, causing the program to abort the current optimization round and try the next optimizer in its working list, rather than crashing the whole program. One of us worked on the Ariane 5 code review, and learned the hard way that there are situations where a program must never crash when it encounters an internal error, but rather make all possible attempts to continue working. Whether the program in question runs in a space rocket or on the contest judges' machines makes no fundamental difference.
The only parts of our code that aren't protected by the crash-proofing techniques above are the parser and pretty-printer. Thus, we tested them very carefully, making sure that our parser recognizes exactly the input language as defined in the challenge task (including borderline cases such as empty lists of constants or state numbers), and that our printer always produces syntactically correct output.
The integration of the various optimizers in a common framework went relatively smoothly. Due to a certain lack of synchronization at the beginning of the development, we ended up with two slightly different data types for abstract syntax trees, each accompanied by its own parser, pretty-printer, and equivalence tester. Rather than attempting an hazardous last-minute rewrite of the optimizers, we just wrote trivial conversion functions between the two syntax trees, and applied them on entrance and exit to the optimizers. We didn't even have to think about where to put the conversions: the OCaml type-checker told us. The OCaml module system was also helpful in ensuring that we didn't get any name clashes between the various parts.
Having other tests than the one example provided by the organizers (the Busker character) proved important, both for evaluating optimizations and catching bugs. Hand-written tests are useful for testing borderline cases and verifying that optimization ideas work as advertised, but machine generated tests also proved very useful.
We thus wrote a code pessimizer, its input language is an extension
of the contest language, while its output is the contest
language itself. The one extension is à la ML pattern matching on
tuples, which the pessimizer dumbly compiles left-to-right using
CASE
only. We know that other compilation techniques
can yield a much smaller output, thus there are ample opportunities for
optimization in those examples.
Consider for instance the following pattern-matching triangle:
(0 (CASE ((VAR "x1") (VAR "x2")(VAR "x3") (VAR "x4") (VAR "x5") (VAR "x6") (VAR "x7") (VAR "x8")) ( (ARM ((1) (1) (1) (1) (1) (1) (1) (1) ) (DECISION 1 "FOO")) (ARM ((2) (2) (2) (2) (2) (2) (2) (2) ) (DECISION 2 "FOO")) (ARM ( _ (1) (1) (1) (1) (1) (1) (1) ) (DECISION 3 "FOO")) (ARM ( _ (2) (2) (2) (2) (2) (2) (2) ) (DECISION 2 "FOO")) (ARM ( _ _ (2) (2) (2) (2) (2) (2) ) (DECISION 4 "FOO")) (ARM (_ _ _ (1) (1) (1) (1) (1) ) (DECISION 5 "FOO")) (ARM (_ _ _ (2) (2) (2) (2) (2) ) (DECISION 6 "FOO")) (ARM (_ _ _ _ (1) (1) (1) (1) ) (DECISION 7 "FOO")) (ARM (_ _ _ _ (2) (2) (2) (2) ) (DECISION 8 "FOO")) (ARM (_ _ _ _ _ (1) (1) (1) ) (DECISION 9 "FOO")) (ARM (_ _ _ _ _ (2) (2) (2) ) (DECISION 10 "FOO")) ) (DECISION _ "DEFAULT")) ) )
Then, the pessimizer produces a program of size 16089, which the local optimization phases reduces to 815. Then the global synthesizer outputs a program of size 139. Note that it would be difficult to write such a test by hand.
Other examples of interest are pattern-matching expressions that
cannot be compiled properly by using CASE
only:
( (0 (CASE ( (VAR "x1") (VAR "x2") (VAR "x3") (VAR "x4") (VAR "x5") (VAR "x6") (VAR "y1") (VAR "y2") (VAR "y3") (VAR "y4") (VAR "y5") (VAR "y6") (VAR "z1") (VAR "z2") (VAR "z3") (VAR "z4") (VAR "z5") (VAR "z6") (VAR "k1") (VAR "k2") (VAR "k3") ) ( (ARM ((1)(1)(1)(1)(1)(1) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ) (DECISION 1 "FOO")) (ARM ((2) _ _ _ _ _ (1)(1)(1)(1)(1) _ _ _ _ _ _ _ _ _ _ ) (DECISION 2 "FOO")) (ARM ( _ (2) _ _ _ _ (2) _ _ _ _ (1)(1)(1)(1) _ _ _ _ _ _ ) (DECISION 3 "FOO")) (ARM ( _ _ (2) _ _ _ _ (2) _ _ _ (2) _ _ _ (1)(1)(1) _ _ _ ) (DECISION 4 "FOO")) (ARM ( _ _ _ (2) _ _ _ _ (2) _ _ _ (2) _ _ (2) _ _ (1)(1) _ ) (DECISION 5 "FOO")) (ARM ( _ _ _ _ (2) _ _ _ _ (2) _ _ _ (2) _ _ (2) _ (2) _ _ ) (DECISION 6 "FOO")) (ARM ( _ _ _ _ _ (2) _ _ _ _ (2) _ _ _ (2) _ _ (2) _ (2) (1)) (DECISION 7 "FOO")) ) (DECISION _ "DEFAULT")) ) )
This example is pessimized into a program of size 318639, which the local optimization phases reduce to 79457. Then, the global optimizer fails to optimize the example any further in the allocated 30 minutes. This is a very demanding example...
The contest judges were kind enough to provide contestants with the set of input programs they used to rank the entries. Below are the space savings that our optimizers achieved on those inputs.
Caption: for each test input, we show the initial size, the sizes after the two local optimization passes (Local 1 and Local 2), after the global synthesizer pass (Synthesizer), and after the re-run of the local optimizers on the output of the synthesizer (Synth-Local 1 and Synth-Local 2). The last column is the best space saving achieved, given as a percentage of the input size.
Input name | Initial size | Local 1 | Local 2 | Synthesizer | Synth-Local 1 | Synth-Local 2 | Space savings |
busker | 576 | 360 | 351 | 358 | 358 | 351 | -39.1% |
cat | 302 | 274 | 265 | 298 | 298 | 265 | -12.3% |
jarboe | 1066 | 298 | 178 | **/174 | **/174 | **/174 | -83.3%/-83.6% |
parrot | 380 | 356 | 356 | **/444 | **/442 | **/356 | -6.3% |
The first set of tests is composed of four characters from an actual interactive fiction game. They have been written by hand in a relatively natural style. One of them (the busker) was given as an example to all participants. On all those tests, the second local optimizer achieved the best results. The synthesizer produced incorrect output on jarboe and parrot, as denoted by the ** entries in the table. These incorrect results were caught by the random test phase and discarded. After the contest, we fixed the bug in the synthesizer and obtained the results shown on the right of the / sign. The corrected synthesizer improves slightly the results on the jarboe test.
Input name | Initial size | Local 1 | Local 2 | Synthesizer | Synth-Local 1 | Synth-Local 2 | Space savings |
ifbusker | 645 | 601 | 351 | 358 | 358 | 351 | -45.6% |
ifcat | 361 | 355 | 280 | 298 | 298 | 280 | -22.4% |
ifjarboe | 1616 | 1616 | 1286 | **/174 | **/174 | **/174 | -20.4%/-89.2% |
ifparrot | 514 | 506 | 364 | **/444 | **/442 | **/356 | -29.1%/-30.7% |
The next set of tests consists in variants of the four hand-written
characters discussed above, with CASE
statements replaced
by cascades of IF
tests.
On ifbusker and ifcat, the second local optimizer does a
reasonable job. In particular, it achieves the same output size for
ifbusker than for the original busker, thus un-doing the
``obfuscation'' performed by the judges on the original character.
On ifjarboe, the local optimizers did OK, but the synthesizer would have done much better (also succeeding in un-doing the obfuscation), if not for its stupid bug.
The ifparrot test shows an interesting pattern that occurs in several of the tests below: the synthesizer alone doesn't do better than the local optimizers, but re-running the local optimizers on the output of the synthesizer does. Figuratively, the synthesizer acts as a nut-breaker: it cracks enough of the nutshell to allow the local synthesizers to get at the fruit.
Input name | Initial size | Local 1 | Local 2 | Synthesizer | Synth-Local 1 | Synth-Local 2 | Space savings |
pessbusker | 1892 | 1704 | 1328 | 358 | 358 | 358 | -81.1% |
pesscat | 799 | 769 | 525 | 268 | 265 | 265 | -66.8% |
pessjarboe | 3153 | 3059 | 2824 | **/174 | **/174 | **/174 | -10.4%/-94.4% |
pessparrot | 3291 | 2458 | 1607 | **/444 | **/442 | **/356 | -51.1%/-89.1% |
The third set of tests is also derived from the four hand-written characters, with a number of ``pessimizing'' transformations applied, making them much larger and much harder to optimize. But that doesn't fool the synthesizer, which is, in essence, able to undo the pessimization and recover the same results as for the original characters. The stupid bug in the submission still bites us in two of the four tests.
Input name | Initial size | Local 1 | Local 2 | Synthesizer | Synth-Local 1 | Synth-Local 2 | Space savings |
alpha0 | 30 | 28 | 28 | 28 | 28 | 28 | -6.7% |
alpha1 | 24 | 22 | 22 | 22 | 22 | 22 | -8.3% |
alpha2 | 8976 | 8392 | 7035 | 3311 | 3185 | 3160 | -64.8% |
mips0 | 3842 | 3652 | 3443 | 3402 | 3374 | 3370 | -12.3% |
pentium0 | 8 | 3 | 3 | 3 | 3 | 3 | -62.5% |
pentium1 | 8 | 3 | 3 | 3 | 3 | 3 | -62.5% |
pentium2 | 8 | 3 | 3 | 3 | 3 | 3 | -62.5% |
pentium3 | 356 | 313 | 211 | 148 | 148 | 148 | -58.4% |
pentium4 | 382 | 319 | 313 | 122 | 122 | 122 | -68.1% |
pentium5 | 335336 | 311326 | 123700 | --- | --- | --- | -63.1% |
sparc0 | 24 | 22 | 22 | 22 | 22 | 22 | -8.3% |
sparc1 | 76 | 67 | 52 | 40 | 40 | 40 | -47.4% |
sparc2 | 30 | 28 | 28 | 27 | 22 | 22 | -26.7% |
sparc3 | 14946 | 13432 | 7096 | 2864 | 2859 | 2486 | -83.4% |
The fourth set of tests comes from a totally different application area. It consists of instruction recognizers for subsets of various processors. They seem to be generated from a high-level processor description.
On all except the smaller tests, the global synthesizer simply kicks ass, delivering results at least twice as small as the local optimizers. The pentium5 test is very large, and causes the global synthesizer to time-out, but fortunately the second local optimizer did a decent job already.
Input name | Initial size | Local 1 | Local 2 | Synthesizer | Synth-Local 1 | Synth-Local 2 | Space savings |
sim-intel.rtl0 | 115339 | 2214 | 2214 | 1391 | 1386 | 1095 | -99.1% |
sim-sparc.rtl0 | 15370 | 15370 | 15370 | --- | --- | --- | 0% |
sim-sparc.rtl1 | 1804 | 1804 | 1804 | 305 | 303 | 231 | -87.2% |
sim-sparc.rtl2 | 1219 | 1219 | 1219 | 245 | 240 | 160 | -86.9% |
xs.rtl0 | 952 | 952 | 265 | 270 | 265 | 265 | -72.2% |
xs.rtl0.mc | 3413 | 2458 | 1005 | 1203 | 1203 | 1005 | -70.6% |
The last set of tests also comes from processor descriptions, although we didn't figure out what they do exactly. At any rate, this is a mixed bunch: sim-sparc.rtl0 is essentially impervious to our optimizations (the local optimizers do nothing and the synthesizer times out), while sim-intel.rtl0 is reduced by a factor of 100. Notice again the nutcracker effect of the synthesizer, which not only beats consistently the local optimizers, but allow them to do an even better job on its output.
First and foremost, let us emphasize that we had a lot of fun working on this programming challenge. Even Leroy, who in the past pooh-poohed the ICFP contest as ``a pissing contest'' and didn't deign looking at this year's challenge until Friday night, ended up spending the whole week-end riveted to his home machine. Thanks a lot to the organizers for coming up with such an interesting topic.
Now is the time to exercise the ``unlimited bragging rights'' promised in the contest announcement.
Functional languages in general, and the Caml language in particular, were a very good match for the problem. The language features we relied on a lot include:
In addition to those language features, several features of the Objective Caml implementation also played an important role:
-ccopt -static
to the
ocamlopt
compiler invocation).
It is interesting to note that 4 of the 8 finalists were using the Caml garbage collector, virtual machine or native-code compiler in one form or another (two OCaml entries, one OLabl entry and one Moscow ML entry).
The topic of the contest was also a very good match for our expertise in compilers and theorem provers. The other two prices also went to experts in the field of compilation of functional programming languages. But compiler skills alone -- useful as they are to acquaint oneself with program transformations and the importance of preserving semantics -- weren't enough. Compiler writers tend to settle for sub-optimal transformations with reasonable running time, mostly effective against human-written input, such as our two local optimizers. It takes someone familiar with theorem proving and decision procedures for various logics to come up with something as ambitious (some would say crazy) as Doligez's synthesizer.
Finally, a word should be said about our highly parallel development process. The organizers warned against large teams as unwieldy and hard to synchronize within the 72 hour time limit. This is true in general, but perhaps less so for the topic of the contest, which has two unusual features as far as programming projects go. First, the input and output of the program, and the relations between both, are perfectly defined. Second, two solutions can always be combined together, obtaining a third solution that is always at least as good as the initial two. (This is not quite right because of the time limits; but disregarding this, a mere shell script suffices to run sequentially N optimizers and select the smallest output.) That's how we were able to harness the programming power of five persons and end up with a program that handles well different styles of inputs (hand-written, hand-pessimized, and machine-generated).