Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

myocamlbuild.ml should be splittable into several files/plugins #5680

Closed
vicuna opened this issue Jul 11, 2012 · 19 comments
Closed

myocamlbuild.ml should be splittable into several files/plugins #5680

vicuna opened this issue Jul 11, 2012 · 19 comments

Comments

@vicuna
Copy link

vicuna commented Jul 11, 2012

Original bug ID: 5680
Reporter: @bobzhang
Status: resolved (set by @damiendoligez on 2017-03-03T10:56:21Z)
Resolution: suspended
Priority: normal
Severity: feature
Category: -for ocamlbuild use https://github.com/ocaml/ocamlbuild/issues
Tags: patch
Related to: #6093
Monitored by: @trefis @Drup @gasche kerneis @hcarty @avsm

Bug description

ocamlbuild is a nice tool, a programmable Makefile, something worries me is that now myocamlbuild.ml is quite big, (more than 1000 lines of code). Actually much of the code is the same and duplicated everywhere. I wonder whether ocamlbuild can accept .cm[oa] as plugins. I did not expect this to come true soon, but it may help ocamlbuild boom

File attachments

@vicuna
Copy link
Author

vicuna commented Aug 5, 2013

Comment author: @gasche

I gave more thoughts to the issue this morning and my temporary conclusions is that there are three different options for making ocamlbuild plugins modular. But before describing them, I would like to recapitulate:
(1) how ocamlbuild presently works wrt. plugins (myocamlbuild.ml)
(2) what would be the criterias/objectives for a better solution

How myocamlbuild.ml works today

Just after loading the command-line options, ocamlbuild looks at whether a myocamlbuild.ml exists in the current project. If it does, it compiles it (if it needs to be recompiled) with a ad-hoc compilation command (including the ocamlbuild libraries distributed with the compiler, and, interestingly, looking in "_tags" for "debug" or "profile" tags that would apply to "myocamlbuild.ml"). Then it executes the _build/myocamlbuild program, and exits directly after that. It is actually _build/myocamlbuild that does the rest of the building task.

This whole logic is in the "plugin.ml" module, called from main as "Plugin.execute_plugin_if_needed". Note that when _build/myocamlbuild is called, it is passed the "-noplugin" option to not itself try to build the plugin again -- we could think of other information to transfer from the first ocamlbuild process to the next.

What composability means

There are two different dimensions to composability.

The first is "composable libraries": being able to build libraries of ocamlbuild logic. I would like to have a myocamlbuild.ml that would just say: "activate support for OCamlfind and Dypgen" or "Use the set of building rules distributed by the Core guys" or "This is a LaTeX project". This means that myocamlbuild.ml, as an OCaml program, should be able to use external libraries. The most natural way to do that today is to rely on ocamlfind, so this is why this issue is described as "load rules from ocamlfind packages" in #6093.

The second is "composable subprojects": being able to describe the build logic at several different places in a project, just like we can add a _tags file in a subdirectory. That's a feature request that is less pressing than enabling the first dimension, but will eventually come as well (git had to get submodules at some point). People want to be able to drop an ocamlbuild-using project in one subdirectory of their current project, and have the whole thing just works.

It is helpful to keep both those needs in mind (of course giving priority to "composable libraries") when thinking about changes that would be needed for OCaml.

A first attempt (this part describes implementation details, may be skipped)

What I tried as a prototype this morning is the following. "If ocamlbuild already looks for 'debug' and 'profile' flags in _tags when building myocamlbuild.ml, why not have it look for 'package(foo)' tags as well to have the compilation step add the relevant ocamlfind packages?". So I tweaked the ad-hoc compilation step in Plugin to also include all the flags corresponding to the tags assigned to "myocamlbuild.ml" (eg. through "-tag" command-line, or (true: foo) or ("myocamlbuild.ml": foo) in _tags).

The problem is that at the point where execute_plugin_if_needed is called, most of the ocamlbuild code hasn't been run yet, and the environment available to compile myocamlbuild.ml is very poor. In particular, Ocaml_specific.init () hasn't been called yet, so simply adding ("myocamlbuild.ml": package(foo)) does nothing, because the rules corresponding to OCaml tags haven't been registered yet. In fact I simplified the story above, the "_tags" file at the root hasn't even been read yet, only command-line options have been processed.

It works if I move the execute_plugin_if_needed tag to much later in the pipeline: after all configuration files have been read, after hygiene has been checked, and after the whole set of directories marked "traverse" have been traversed and their "_tags" file processed.

The problem with this first attempt is performances: that amounts to redoing some work twice, once before compiling myocamlbuild.ml and one in the second ocamlbuild process. This may have unpalatable performance impacts on large projects. Note that we don't need to compute checksums twice, so the impact may actually turn out to be small; and hygiene can be disabled on the first run, etc.

Options for a realistic implementation

I see three different options:

(1) In order to avoid traversing the traverse-set twice (this is the non-constant cost among things redone twice), we may decide to only read the _tags file at the root of the project, and no other, to determine the build options for myocamlbuild.ml. Note that we may also decide to read options from a new kind of file (eg. _plugin_tags).

(2) We may decide to do the traversal anyway, looking for information on how to build myocamlbuild.ml in all the subdirectories and their _tags file. This opens the door for supporting the second kind of modularity, composability of subprojects, but we would need to decide on a semantics for what a myocamlbuild.ml in a subproject means. I'm not sure we can get something clean (in particular an user could logic in the myocamlbuild.ml of a subdirectory to only apply in this subdirectory, which would require a way to un-register the corresponding rules, and besides being an invasive implementation change it's even not clear that it has a well-defined semantics).

(3) We may move to dynamic loading for plugins, instead of compilation-and-silent-rerun as is currently done. Upon finding a myocamlbuild.ml, compile it into myocamlbuild.cmxs and dynlink it into the current ocamlbuild process. If the _tags files (or _plugin_tags, etc.) recommend some ocamlfind libraries, link their own .cmxs beforehand -- I checked and this kind of stuff would work (on a x86 or x86_64 Linux at least) if packagers correctly provide the required .cmxs.

The first option looks rather simple to implement. It might even be possible to get something working and reasonably robust to get into the next release -- but don't get your hopes to high, as I think we should not commit something fragile.

If we don't try to support myocamlbuild.ml in subdirectories (forgetting about subproject composability for now), the second option is also reasonably easy to implement (let's call this 1-in-depth), but has potential performance consequences. I would expect the performance cost to actually be neglectible compared to the checksum computation (that would only be performed once), so don't close the door on that.

The third option looks like a trouble-maker. It could be the most easy to make efficient, but it's likely to be buggy on the first few tries and that's more a medium-term change to consider.

Current opinion

I like these two options:

  • 1-in-depth: read all the _tags file in depth and compile the root myocamlbuild.ml with the flags corresponding to those tags; this is consistent with the current ocamlbuild semantics (if a _tags file is taken into account, all are)
  • _plugin_tags: have a single new kind of files at the root that would be parsed to know the compilation options to perform on the plugin; the non-subproject-composable aspect is clear

I think both could be extended in the future towards subproject composability (looks hard to define well and painful to implement) or dynamic linking options (looks painful to implement well).

I didn't discuss building the OCamlbuild plugin(s) out of several .ml files foudn in the project directory. That could be accounted for by a slight adaptation of the current semantics (in fact ocamlbuild will also look for a myocamlbuild_config.ml, but please forget I said anything about that), but I don't think it is very interesting. I suspect the current perceived need to compose several .ml files comes from the inability to compose a single myocamlbuild.ml with external libraries (so let's import all this stuff from OASIS and compile them here in my project?), and wouldn't bring in additional use cases.

@vicuna
Copy link
Author

vicuna commented Aug 5, 2013

Comment author: @dbuenzli

Not sure I digested all of it. But in my view the subproject goal is not necessarily worth of pursuit (and git submodules are horrible to use or I don't understand them). Actually I wouldn't mind if there was a single _tags file only allowed at the root of the directory.

Regarding the _plugin_tags file I'd rather avoid introducing a new file, as other have suggested somewhere else, a weakness of ocambuild is that it tends to disseminate the build information in too many files.

It turns out that what you find the least interesting (building a plugin from several ml files) is the most to me. I don't necessarily see rules coming from a library and it's still usefull to be able to modularize them (even if they turn out to be abstracted in a library later). Also a package providing rules may actually want to use these rules to build itself or its test suite, or its examples... so you get a bootstrap problem at that point.

@vicuna
Copy link
Author

vicuna commented Aug 5, 2013

Comment author: @gasche

One problem with using the stuff from _tags to build myocamlbuild.ml is that in addition to ("myocamlbuild.ml": foo) that clearly looks right for this purpose, you get all the stuff that people put in (true: foo) that was certainly meant for the project itself and not its plugin file. Admittedly, we can live with that (in most case that would mean several useless ocamlfind packages passed to myocamlbuild.ml).

I'm ready to hear more about your .ml use-cases and change my mind. That said I don't really see why you would need to split your myocamlbuild.ml into independent units, and yet refuse to distribute them separately (though I understand distributing more small packages increases the release/packaging cost).

I don't understand your bootstrap problem: regardless of whether you pass myocamlbuild.ml source files, ocaml modules or findlib packages, you cannot use their own semantic content to build them from myocamlbuild.ml. I get the testsuite point: if the only way to use some logic of an ocamlbuild plugin is to get it from ocamlfind, the testsuite of the distributed plugin would have to install it before being run, which is not nice.

If we manage to rationalize the part of the code that builds myocamlbuild.ml (using the rich ocaml_specific logic instead of the current simple hard-coded command), a cheap design would be have a "myocamlbuild.mlpack" file in the root directory work as expected (use all the listed modules as plugins for the second invocation, after a useless but harmless packing step). There may be other options.

If we had some form of dynamic linking of plugins (possibly less ambitious than the third option discussed in my first email), that could also be an option. You could invoke your testsuite with "ocamlbuild -load-plugin mystuff.cmo".

Ah. One last idea would be to control the running ocamlbuild from a toplevel (ocaml or ocamlnat) instead of reimplementing the dynlinking logic ourselves. That may be a cost-effective way to get some like the "all dynamic linking" option, but also open its own can of worms (ocamlnat maintenance status is even worse than ocamlbuild's right now).

@vicuna
Copy link
Author

vicuna commented Aug 5, 2013

Comment author: meyer

My thoughts were given in associated defect. We only need to support adding options to the myocamlbuild.ml compilation command line.

@vicuna
Copy link
Author

vicuna commented Aug 6, 2013

Comment author: @bobzhang

I would like to get back when I have time. But I have one concern, suppose plugin is enabled, how hard would it to be distributed?

For example, if my ocamlbuild depends on other plugins, when I release my library, is there a way to automatically pack all the dependency without disturbing my users?

@vicuna
Copy link
Author

vicuna commented Aug 6, 2013

Comment author: @gasche

A first released attempt

I have attached a first rough implementation of what I discussed above. Use .gitpatch if you use git (it's three separate patches bundled by format-patch), and .patch otherwise (the three are collapsed in a single patch).

It corresponds to option 1-in-depth: all the traversable directories are traversed to collect information (which is actually useless for non-root directories, but makes the patch smaller), and only at the end of the initialization phase is myocamlbuild.ml built and executed. Hygiene is not done twice (only during the second step).

The part that derives the tags to pass to the myocamlbuild.ml compilation command is a bit of a hack: we use ocaml++program++link++{byte,native} and pray that it works. In practice it works.

You're welcome to try the patch and see if it fits your potential use-cases for modular ocamlbuild plugins.

(No guarantee that this will end up included or that the feature demonstrated here will also work in a final solution.)

Example use cases.

(1) Using an ocamlfind package in your myocamlbuild.ml

myocamlbuild.ml:
  let foo = Str.quote "foo"

_tags:
  "myocamlbuild.ml": package(str)

test.ml:
   let x = 1

ocamlbuild -use-ocamlfind test.byte

(2) Linking a local module to myocamlbuild.ml (... but also the final compiled result)

myocamlbuild.ml:
  let foo = Lib.id "foo"

lib.ml:
   let id x = x

test.ml:
   let x = 1

 ocamlbuild -no-plugin lib.cmo
 ocamlbuild -byte-plugin -mod lib.cmo test.byte

In the second case I use the "-mod" command-line flag that links compilation units to the programs. That means that lib.cmo will be linked both in the plugin and test.byte, which is of course not optimal. If we had flags to say the equivalent thing locally in _tags instead of on the command-line, we could do something better here. A -before-plugin-option flag would also fix this issue.
More generally, this approach rests on the expressivity of ocamlbuild flags and compilation options, will let you do all they can do, and no more.

@vicuna
Copy link
Author

vicuna commented Aug 6, 2013

Comment author: @dbuenzli

Regarding your 2) I don't want to have to invoke ocamlbuild with command line flags. I don't understand why we don't just introduce a few tags that allow us to tell that thing should be linked in the myocamlbuild.ml (and create one if this one doesn't exist). Rules are added by side effect anyway.

The bootstrap problem is this. Suppose I'm the js_of_ocaml project. I have new rules for building. I want to be able to use these rules for building part of my project and I also want to be able to install them with the package and I don't want to repeat myself. This means that we want two things

  1. Being to be able to specify a project local file to link into the myocamlbuild.
  2. Being to be able to specify a package library to link into the mycamlbuild.

Besides we want all that to be in _tags.

For 1) propose to add a new tag ocamlbuild_plugin or ocamlbuild_rules. Tagging an ml file with this tag compiles it and links the file in the myocamlbuild executable. Example, suppose you have :

rules.ml:
  let () = dispatch begin function
  | After_rules -> rules ()
  end 

I just want to write in my tags file:

_tags:
    "rules.ml" : ocamlbuild_rules 

And that should ensure that these rules are linked in the myocambuild executable.

For 2) I think we can do as you suggested, i.e.:

_tags:
    "myocamlbuild.ml" : pkg(js_of_ocaml.ocamlbuild)

I would also like to have a rules _tags -> myocamlbuild.ml so that I don't even need to have a myocamlbuild.ml file in my project directory. The above tags file would just work. (But maybe we could rename myocamlbuild.ml to something more sensitive, like ocamlbuild_rules).

@vicuna
Copy link
Author

vicuna commented Aug 7, 2013

Comment author: @gasche

Of course I agree that "-mod trick" is suboptimal and that ocamlbuild tags need
to be enhanced to allow this use-case better. My own proposal(s) so
far amount to the following: use the standard semantics of OCamlbuild
(with its built-in tags and rules) to build the myocamlbuild
program. Instead of adding a special flag whose semantics only applies
to building the plugin, I'm interested in improving ocamlbuild with
features that make your use-case convenient, but are also useful to
build other programs.

(Note: currently my patch doesn't achieve this generality, it only tweaked the ad-hoc building of the plugin so that it emulate in a "good enough" way the standard semantics.)

You suggest:

_tags:
    "rules.ml" : ocamlbuild_rules 

I think this is wrong. The current semantics of tag files is that
pred: foo adds tag foo to the compilation of things that match
pred, and you're doing something completely different. But I hope
you could be satisfied with something looking like:

_tags:
    "myocamlbuild": depmod(rules)

With the semantics that depmod(foo) adds foo.cm{x,o} at the linking
step and a dependency on this file.

I would also like to have a rules _tags -> myocamlbuild.ml so that
I don't even need to have a myocamlbuild.ml file in my project
directory.

That's a good idea (looks like .cmo -> .mllib made kids). I'm still
a bit surprised that this would work well... Note that if we have the
right semantics for the tags, we should be able to generate an empty
myocamlbuild.ml file, instead of painfully harvesting the file to
generate some code.

@vicuna
Copy link
Author

vicuna commented Aug 7, 2013

Comment author: kerneis

_tags:
"myocamlbuild": depmod(rules)

With the semantics that depmod(foo) adds foo.cm{x,o} at the linking
step and a dependency on this file.

Note that this is (almost) what ocaml_lib does when used in myocamlbuild.ml, with ~extern:false and ~dir:".". The only difference as far as I can see is that it would link foo.cm{a,xa}.

@vicuna
Copy link
Author

vicuna commented Aug 7, 2013

Comment author: @gasche

Indeed; this may be packaged on the plugin side as an ocaml_mod construct, but this is also very close to what direct flag_and_dep invocation gets you.

I think you both know this, but for the purpose of readability I'll point out that .cma are not a good option in this scenario, as their semantics is to only link what the program explicitly depends upon, while here Daniel precisely want to link a plugin module for its side-effects only (with no reference to it from myocamlbuild.ml).

Note that in general I dislike linking for side-effects and I'd prefer people to provide pure modules with an (init : unit -> unit) function, to be explicitly called in myocamlbuild.ml. But this conflicts with Daniel's design pressure which is to get rid of myocamlbuild.ml as much as possible.

I personally suspect that it's enough to have the long-term goal of having myocamlbuild.ml that are simple and declarative-looking, instead of not having them at all, if that can get us pure plugins. Side-effects-only linking opens other cans of worms, such that having poor control over the order of application (which may become really bad if two plugins have conflicting effects, admittedly a bad situation to start with).

@vicuna
Copy link
Author

vicuna commented Aug 7, 2013

Comment author: @dbuenzli

I'd also be happy if we can avoid side-effecting modules and mandate that a rule module M exports a few well-known identifiers (e.g. M.after for the rules to put in `After etc.) but I fear this would imply dynlink if we don't want to need to write any myocamlbuild.ml. Agreed with the wrong way of my proposal.

@vicuna
Copy link
Author

vicuna commented Aug 7, 2013

Comment author: @gasche

I think that a reasonable first step would be to make the following myocamlbuild.ml work:

let () =
dispatch (Menhir_ocamlbuild_plugin.init ());
dispatch (Bisect_ocamlbuild_plugin.init ());
dispatch (Local_module_foo.init ());

using a _tags file resembling the following:
<myocamlbuild.*>: package(menhir.ocamlbuild), package(bisect.ocamlbuild)

(Note that the dependency on local_module_foo.ml would be taken care of by ocamlbuild's usual rules.)

@vicuna
Copy link
Author

vicuna commented Aug 7, 2013

Comment author: @dbuenzli

Why not. But just note that this is not very different of linking in side-effecting modules (if the order in which you define packages in _tags matters), it's just more inconvenient.

@vicuna
Copy link
Author

vicuna commented Aug 18, 2013

Comment author: @gasche

An update: after discussion with Damien, I pushed the patch above in version/4.01 as well as trunk. The reasoning is that while the change is a bit invasive, it would also have very beneficial effects on the plugin ecosystem (eg. OASIS) that we would like to see after 4.01 is released.

Of course, when inspecting the code yesterday I found an issue with the way Param_tags.init() is handled -- so the change as integrated is slightly buggy. I am working on a clean fix for that, but if it appears that this is too hard to get right the whole change might still get reverted.

@vicuna
Copy link
Author

vicuna commented Aug 18, 2013

Comment author: @gasche

Here is more information about the current issue. Feel free to ignore that length message if you're not interested in development internals.
TL;DR: it's not a backward-incompatible regression, but some unwanted warnings during the plugin compilation file.

Parametrized tags are handled by ocamlbuild in a simple, but slightly painful, way. When they are encountered in a configuration file, they are stored in a queue internal to the Param_tags module, and they have no direct semantic effect. When Param_tags.init() is called, this queue is processed and each instance foo(bar) of a parametrized tag is specialized: having written "package(ocamlnet)" will literally execute an instance of the parametrized rule for the tag "package(ocamlnet)", etc. Param_tags.init will also emit warnings if an unknown tag is used, or the tag arity (parametric or not) is wrong.

Param_tags.init() is called exactly once at the end of ocamlbuild's initialization process. Before the present change, compilation and execution of the plugin happened well before that (so parametrized tags were not available at plugin's compilation time). The patch I applied moves the plugin action after Param_tags.init(). This means that plugin's compilation will have access to built-in parametrized tags, but also that parametrized tags declared or applied by the plugin code itself will not be available at the time Param_tags.init is called. So any application of those in the _tags file will result in a warning from Param_tags.init (the parametrized tag is still unknown) during the plugin compilation phase.

Note that this doesn't mean anything bad for the second run, when ./myocamlbuild is executed and the "real compilation" happens: this run will have the plugin code execute before Param_tags.init(), so everything works well. We are not talking about a regression, but unwanted warnings during the plugin compilation phase.

I see different possible actions:

(1) do nothing and let the first run send wrong warnings; I think this is unduly confusing for users (we want to encourage modular plugins as a way for non-experts to use ocamlbuild plugins, so it should not create confusion even among beginners)

(2) add a ?quiet:bool parameter to Param_tags.init that would silence all warnings during the first Param_tags.init call (I mean during plugin compilation). I'm unhappy with this choice as well because it means actual mistakes in the ("myocamlbuild.ml": foo) line (relying on an non-built-in parametrized tag for plugin compilation) will not result in a warning. Again, we want non-expert users to be able to compile a plugin file without being blind-folded about what doesn't work.

(3) Initialize (non-quietly) only the tags that apply to "myocamlbuild.ml" during the first plugin run. This is a good solution on paper, but the problem is that the (true: foo) lines also apply, and we can expect users to use plugin features not meant for "myocamlbuild.ml" in those. This is a situation where the choice to go with _tags instead of a separate plugin-specific file bites us.

(4) Merge (2) and (3): initialize quietly the global tags (those that apply to any file because they were passed with (true: foo) or -tag foo), and initialize non-quietly the tags that apply specifically to "myocamlbuild.ml".

I'm currently going for solution (4). It is mildly unsatisfying to have to hack around the (true: foo) stuff, but I think that is the best medium-term solution: going for a separate _plugin_tags file would probably be too large a change before the new version, and I think being released has value in this case.

Any feedback is welcome.

@vicuna
Copy link
Author

vicuna commented Aug 18, 2013

Comment author: kerneis

Solution (4) is disappointingly hackish, but probably the most sensible. The only other short-term solution that I can think of is "(2) by default, and (1) in -verbose >= 1 mode" but it's not friendly to non-expert users at all.

It might hint at the short-comings of the current solution though, and I hope we are not bound later by the curse of backward compatibility. If it's acceptable, I think this whole feature should be marked explicitly as experimental and subject to breaking changes in the release notes.

@vicuna
Copy link
Author

vicuna commented Aug 19, 2013

Comment author: @gasche

I pushed a change implementing option (4), along with testsuite coverage, in both version/4.01 and trunk.

I'm not sure what backward-compatibility guarantees to give. I think it would be good to say that tags explicitly added to "myocamlbuild.ml" will be used for plugin compilation (but no guarantee on fuzzier predicates such as true). This does not preclude adding a new _plugin_tags in the future if there is an agreement this is a better solution, but if we want OASIS (for example) to take advantage of modular plugin compilation we should preserve some safe subset of the current behavior in the future.

@vicuna
Copy link
Author

vicuna commented Aug 25, 2013

Comment author: @gasche

Weekly modular-ocamlbuild update:

I found out last week that this patchset introduced a regression. In the existing code for myocamlbuild.ml hardcoded compilation (which I preserved, only adding more flags coming from the plugin tags), unix.cm(x)a is explicitly linked, because the OCamlbuild implementation relies on some Unix features. If the user had a use_unix or package(unix) in its (true: ...) tags, this would break compilation of the whole project with an error about double-linking of unix.cmxa.

The proper solution is to revert the changes that use the _tags file for plugin compilation, and go for a simpler (and thus more principled and less likely to introduce regressions) implementation: take plugin tags from a new command-line option, -plugin-tag(s).

(Note that this also solve the Param_tags problem I mentioned earlier, as in this case we don't initialize the _tags tags at all. Cleaner designs have good side-effects.)

I know that a command-line option is much less convenient from an end-user point of view (Daniel explicitly mentioned that earlier), but having it in 4.01 still open modular perspectives to a couple of realistic user situations:

  • OASIS, which has control on ocamlbuild invocation itself and can therefore pass whatever command-line flag there is
  • people using a Makefile or shell script to drive the ocamlbuild invocation anyway
  • ocamlbuild wrappers such as RWO's "corebuild" script

Of course medium-term plans are to get rid of this "only command-line option" limitation, to have something more convenient. I still think a _plugin_tags file would be in order, but have been discussing with Gabriel Kerneis and Thomas Refis ideas to alleviate the "there are too many separate configuration files" problem.

Stay tuned.

@vicuna
Copy link
Author

vicuna commented Mar 3, 2017

Comment author: @damiendoligez

ocamlbuild is now a separate project that lives on GitHub.
PR transferred to ocaml/ocamlbuild#208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant