Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007853OCamlconfigure and build/installpublic2018-09-23 22:542018-09-24 15:44
Assigned To 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0007853: Interest in bootstrapping without OCaml binary?
DescriptionIs there any interest to make OCaml able to compile itself, without using a binary of the compiler?

I have a 0001933:0001500 loc OCaml prototype interpreter ( [^] ), which is almost able to run the compiler compiling itself (it crashes while trying to link the object files together due to the use of Obj).

It doesn't support a lot of OCaml features, but since the objective is only to compile the compiler, they are not really needed. It is written in OCaml for now, and uses compiler-libs to handle the parsing of the source files, so this would need to be changed as well, either by writing it in another language (probably C since all the primitives are available there), or in a very small subset of OCaml that would be compiled by something else. A solution must be found to the problem of the parsing as well, maybe by modifying ocamlyacc (although that would make a second parser to maintain, now that the parser has switched to Menhir).

However, keeping the above in sync with the changes of the compiler over time would be tedious if it were not integrated to the compiler. Thus, would a contribution that would bootstrap the compiler without a binary be accepted?
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
dra (developer)
2018-09-24 09:12

This isn’t bootstrapping the compiler without a binary, though - you require an OCaml compiler to build your interpreter.

That’s practical for a language such as C where there are multiple compilers for a given platform (e.g. GCC is bootstrapped using any C compiler to a required standard), but doesn’t make sense for a language where there’s only one implementation.

While it’s not an absolute property of our present bootstrap that it’s repeatable, we do at the moment only commit changes to bootstrap images which can be verified, which is a property which gets completely lost with your version, I think.
nore (reporter)
2018-09-24 09:26

> This isn’t bootstrapping the compiler without a binary, though - you require an OCaml compiler to build your interpreter.

This is why this interpreter is only a prototype, and would need to be rewritten in another language to complete the bootstrap. It seems to me that while there is only one implementation, it is useful to be able to bootstrap it without any source in the language itself.

> While it’s not an absolute property of our present bootstrap that it’s repeatable, we do at the moment only commit changes to bootstrap images which can be verified, which is a property which gets completely lost with your version, I think.

What do you mean? After rewriting that interpreter to something else that would not require the OCaml compiler, I don't see the problem: there would be no bootstrap binaries needed at all.
dra (developer)
2018-09-24 10:38

Ah, I hadn't noticed that you wanted to change language once it was complete as well. With the interpreter in another language, I agree with the concept of wishing to do this, but - probably in common with other devs - not with the reality of having to maintain it. It might be possible to solve the simpler problem of writing a lambda interpreter, but there's precious little difference in terms of being able to inspect the code between a lambda interpreter and a bytecode interpreter, if one's honest!

The bootstrap does still need to be present - ultimately the OCaml compiler is written in OCaml, so you want to compile it with itself. With the present method, having bootstrapped the compiler with the bytecode blobs in boot/ you can use the compiler just built to regenerate the boot blobs and verify that the compiler has indeed just been compiled with itself (it's referred to as the fixpoint in make bootstrap).
nore (reporter)
2018-09-24 13:55

I intended to change the language of the interpreter to either directly C (to be able to easily benefit from the primitives), or to some reduced subset of OCaml expressive enough to get human-readable code (and not lambda calculus, as that is far harder to read).

The compiler would be bootstrapped with the interpreter, then it would be possible to check the compiled compiler produces the same thing when compiling itself. One of the differences between bootstrapping from a readable interpreter and from a binary is that bootstrapping from a binary is susceptible to "trusting trust" attacks, which an interpreter is not subject to.
gasche (administrator)
2018-09-24 15:13

Personally I think this is an interesting project (whether or not we end up using it for bootstrapping; I would assume that it could be sensibly slower than the bytecode interpreter, so maybe not appropriate for your everyday-bootstrap?). I am also interested in getting reference interpreters for subsets of OCaml (another is [^]) for differential compiler testing.

(It would be kind of cool to have an interpreter in Rust, but I'm merely saying that because if I was personally working on your project I would take it as an excuse to practice Rust instead of C. Rust has a fairly large dependency set (LLVM, etc.), so it may be less suitable than C as a debootstrapping language.)
dra (developer)
2018-09-24 15:30

I certainly agree with Gabriel that it's an interesting project! Answers from me so far have been aimed at the practicality of its replacing the existing bootstrap.

By lambda, I was referring to (one of) the compiler's intermediate languages, not the lambda calculus!

Given what will be the inevitable complexity of any interpreter, it still doesn't really solve the trusting trust problem. However, a separate way of bootstrapping does of course allow the binary blobs in the existing bootstrap to have an increased level of trust, as you can use David A. Wheeler's technique to compare two genuinely different compilation techniques.

Out of curiosity, what do you anticipate would be less tedious about maintaining a separate interpreter by incorporating it in the upstream tree?
nore (reporter)
2018-09-24 15:44

One of the reasons why I believe that is that if the bootstrapping code is in the upstream tree, it is possible to avoid adding code in the compiler that is difficult to interpret while there is an almost identical equivalent that is far easier to interpret (for instance, partially-applied functions with labels fall in this category: it is hard to interpret correctly without typing information, and making the closure explicit in the source does not add much complexity).

Concerning the trusting trust problem, while it does not completely solve it, having the whole code human-readable makes it far harder to introduce an attack in the compiler that would not be immediately detected. It also allows to use the method to compare different techniques, as you said.

To be fair, my objective here is not really to counter trusting trust attacks (although this is a nice side-effect), I am only doing that because it is an interesting project :). Besides, as OCaml is a language very suitable for writing compilers or interpreters, it would make it easier to have bootstrap chains for many other languages.

- Issue History
Date Modified Username Field Change
2018-09-23 22:54 nore New Issue
2018-09-24 09:12 dra Note Added: 0019380
2018-09-24 09:26 nore Note Added: 0019381
2018-09-24 09:53 dra Status new => acknowledged
2018-09-24 10:38 dra Note Added: 0019382
2018-09-24 13:55 nore Note Added: 0019383
2018-09-24 15:13 gasche Note Added: 0019384
2018-09-24 15:30 dra Note Added: 0019387
2018-09-24 15:44 nore Note Added: 0019388

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker