Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interest in bootstrapping without OCaml binary? #7853

Closed
vicuna opened this issue Sep 23, 2018 · 15 comments
Closed

Interest in bootstrapping without OCaml binary? #7853

vicuna opened this issue Sep 23, 2018 · 15 comments

Comments

@vicuna
Copy link

vicuna commented Sep 23, 2018

Original bug ID: 7853
Reporter: nore
Status: acknowledged (set by @dra27 on 2018-09-24T07:53:31Z)
Resolution: open
Severity: feature
Category: configure and build/install
Monitored by: @nojb @gasche

Bug description

Is there any interest to make OCaml able to compile itself, without using a binary of the compiler?

I have a ~1500 loc OCaml prototype interpreter ( https://github.com/Ekdohibs/camlboot ), which is almost able to run the compiler compiling itself (it crashes while trying to link the object files together due to the use of Obj).

It doesn't support a lot of OCaml features, but since the objective is only to compile the compiler, they are not really needed. It is written in OCaml for now, and uses compiler-libs to handle the parsing of the source files, so this would need to be changed as well, either by writing it in another language (probably C since all the primitives are available there), or in a very small subset of OCaml that would be compiled by something else. A solution must be found to the problem of the parsing as well, maybe by modifying ocamlyacc (although that would make a second parser to maintain, now that the parser has switched to Menhir).

However, keeping the above in sync with the changes of the compiler over time would be tedious if it were not integrated to the compiler. Thus, would a contribution that would bootstrap the compiler without a binary be accepted?

@vicuna
Copy link
Author

vicuna commented Sep 24, 2018

Comment author: @dra27

This isn’t bootstrapping the compiler without a binary, though - you require an OCaml compiler to build your interpreter.

That’s practical for a language such as C where there are multiple compilers for a given platform (e.g. GCC is bootstrapped using any C compiler to a required standard), but doesn’t make sense for a language where there’s only one implementation.

While it’s not an absolute property of our present bootstrap that it’s repeatable, we do at the moment only commit changes to bootstrap images which can be verified, which is a property which gets completely lost with your version, I think.

@vicuna
Copy link
Author

vicuna commented Sep 24, 2018

Comment author: nore

This isn’t bootstrapping the compiler without a binary, though - you require an OCaml compiler to build your interpreter.

This is why this interpreter is only a prototype, and would need to be rewritten in another language to complete the bootstrap. It seems to me that while there is only one implementation, it is useful to be able to bootstrap it without any source in the language itself.

While it’s not an absolute property of our present bootstrap that it’s repeatable, we do at the moment only commit changes to bootstrap images which can be verified, which is a property which gets completely lost with your version, I think.

What do you mean? After rewriting that interpreter to something else that would not require the OCaml compiler, I don't see the problem: there would be no bootstrap binaries needed at all.

@vicuna
Copy link
Author

vicuna commented Sep 24, 2018

Comment author: @dra27

Ah, I hadn't noticed that you wanted to change language once it was complete as well. With the interpreter in another language, I agree with the concept of wishing to do this, but - probably in common with other devs - not with the reality of having to maintain it. It might be possible to solve the simpler problem of writing a lambda interpreter, but there's precious little difference in terms of being able to inspect the code between a lambda interpreter and a bytecode interpreter, if one's honest!

The bootstrap does still need to be present - ultimately the OCaml compiler is written in OCaml, so you want to compile it with itself. With the present method, having bootstrapped the compiler with the bytecode blobs in boot/ you can use the compiler just built to regenerate the boot blobs and verify that the compiler has indeed just been compiled with itself (it's referred to as the fixpoint in make bootstrap).

@vicuna
Copy link
Author

vicuna commented Sep 24, 2018

Comment author: nore

I intended to change the language of the interpreter to either directly C (to be able to easily benefit from the primitives), or to some reduced subset of OCaml expressive enough to get human-readable code (and not lambda calculus, as that is far harder to read).

The compiler would be bootstrapped with the interpreter, then it would be possible to check the compiled compiler produces the same thing when compiling itself. One of the differences between bootstrapping from a readable interpreter and from a binary is that bootstrapping from a binary is susceptible to "trusting trust" attacks, which an interpreter is not subject to.

@vicuna
Copy link
Author

vicuna commented Sep 24, 2018

Comment author: @gasche

Personally I think this is an interesting project (whether or not we end up using it for bootstrapping; I would assume that it could be sensibly slower than the bytecode interpreter, so maybe not appropriate for your everyday-bootstrap?). I am also interested in getting reference interpreters for subsets of OCaml (another is https://github.com/johnwhitington/ocamli) for differential compiler testing.

(It would be kind of cool to have an interpreter in Rust, but I'm merely saying that because if I was personally working on your project I would take it as an excuse to practice Rust instead of C. Rust has a fairly large dependency set (LLVM, etc.), so it may be less suitable than C as a debootstrapping language.)

@vicuna
Copy link
Author

vicuna commented Sep 24, 2018

Comment author: @dra27

I certainly agree with Gabriel that it's an interesting project! Answers from me so far have been aimed at the practicality of its replacing the existing bootstrap.

By lambda, I was referring to (one of) the compiler's intermediate languages, not the lambda calculus!

Given what will be the inevitable complexity of any interpreter, it still doesn't really solve the trusting trust problem. However, a separate way of bootstrapping does of course allow the binary blobs in the existing bootstrap to have an increased level of trust, as you can use David A. Wheeler's technique to compare two genuinely different compilation techniques.

Out of curiosity, what do you anticipate would be less tedious about maintaining a separate interpreter by incorporating it in the upstream tree?

@vicuna
Copy link
Author

vicuna commented Sep 24, 2018

Comment author: nore

One of the reasons why I believe that is that if the bootstrapping code is in the upstream tree, it is possible to avoid adding code in the compiler that is difficult to interpret while there is an almost identical equivalent that is far easier to interpret (for instance, partially-applied functions with labels fall in this category: it is hard to interpret correctly without typing information, and making the closure explicit in the source does not add much complexity).

Concerning the trusting trust problem, while it does not completely solve it, having the whole code human-readable makes it far harder to introduce an attack in the compiler that would not be immediately detected. It also allows to use the method to compare different techniques, as you said.

To be fair, my objective here is not really to counter trusting trust attacks (although this is a nice side-effect), I am only doing that because it is an interesting project :). Besides, as OCaml is a language very suitable for writing compilers or interpreters, it would make it easier to have bootstrap chains for many other languages.

@github-actions
Copy link

github-actions bot commented May 7, 2020

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 7, 2020
@github-actions github-actions bot closed this as completed Jun 8, 2020
@gasche
Copy link
Member

gasche commented Jun 8, 2020

I had missed this stalebot notification when it came in last month, but this issue is still alive. I contributed a bit to @Ekdohibs' interpreter, which is now able to interpret the native compiler. (The bytecode compiler is tricky to support well, due to the usage of marshalled data within .cmo files; the interpreter uses a different marshalling format, and the build system tries to use the just-build compiler to link files produced by the interpreted compiler.)

@gasche gasche reopened this Jun 8, 2020
@nojb nojb removed the Stale label Jun 8, 2020
@nojb
Copy link
Contributor

nojb commented Jun 8, 2020

(I suggest removing the Stale label as well to clearly mark issues as still relevant.)

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Jun 11, 2021
@gasche
Copy link
Member

gasche commented Jun 11, 2021

@Ekdohibs's camlboot is now able to build OCaml 4.07 without the bootstrap compiler (and to confirm that the current 4.07 bootstrap files are correct). It is not up to date for 4.12-4.13 yet.

@github-actions github-actions bot removed the Stale label Jun 14, 2021
@github-actions
Copy link

github-actions bot commented Jul 1, 2022

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Jul 1, 2022
@gasche
Copy link
Member

gasche commented Jul 1, 2022

I propose to close this issue. Camlboot is still not able to bootstrap recent OCaml versions (we've had plenty of other things on our plate), but it's "easy in theory" (and tedious in practice) to do it.

@gasche gasche closed this as completed Jul 1, 2022
@DemiMarie
Copy link
Contributor

@gasche: reminds me of Rust and mrustc. Even if it could only bootstrap old Rust versions, that was enough to prove the absence of a trusting trust attack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants