New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
questionable reasoning in job control code #5371
Comments
Comment author: @edwintorok The read doesn't really return 0, but ocambuild_executor.ml maps all Unix errors as if read returned 0, and from there glondu's analysis stands. Here is a proposed patch: It would be nice if EINTR was handled some way too though. Thanks to Erkki Seppälä/flux on #ocaml for the idea on setting back to blocking mode. |
Comment author: @glondu I think this is not enough. Actually, all ocamldep jobs start by closing their standard output (since it is redirected), which triggers job termination, which consists in waiting for the remaining output of the job, i.e. the whole job in case of those that redirect their standard output. It theoretically means that ocamldep jobs are not parallelized, and it seems to be what is happening in practice. I retract my initial suggestion about using a non-blocking close_process_full as a better way to fix that: a job would (theoretically) stay "active" as long as there is no output from other jobs. A neater, non-intrusive, approach could be to use signalfd... but this is Linux-specific :-( A portable way could be to schedule job termination from a SIGCHLD handler. But {open,close}process_full seem to be a bad interface for that since they don't expose the PID of the child process, and the logic implemented in close is not the one we want. |
Comment author: meyer This should be retriaged ... |
Comment author: @edwintorok FWIW ocamlbuild returns with 'signal -8' for commands that print warnings/errors to stderr (like ocamlfind, menhir). |
Comment author: @gasche Thanks to an SSH access to a FreeBSD box given by Edwin, I could reproduce the issue and verify that the patch above (submitted again... three years ago) makes ocamlbuild work again on FreeBSD. I included the patch in 4.02 and trunk. The patch only fixes the symptoms, though: as I understand it the issue is not yet solved. I'm leaving the PR open for this reason. |
Comment author: @damiendoligez ocamlbuild is now a separate project that lives on GitHub. |
Original bug ID: 5371
Reporter: @glondu
Status: resolved (set by @damiendoligez on 2017-02-27T12:57:09Z)
Resolution: suspended
Priority: normal
Severity: minor
Version: 3.12.0
Target version: later
Category: -for ocamlbuild use https://github.com/ocaml/ocamlbuild/issues
Tags: patch
Monitored by: mehdi @edwintorok
Bug description
ocamlbuild recently started failing with dash 0.5.7, which is /bin/sh on Debian, when the ocamldep command writes to stderr. This was triggered by a patch that has been reverted in Debian (see [1]).
[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642922
However, after looking more closely at what was going on inside dash and ocamlbuild, I believe ocamlbuild is genuinely faulty: in ocamlbuild_executor.ml, around line 205, it assumes that when reading from a file descriptor that has been returned as ready for reading by select returns 0 bytes, the corresponding job is finished, and it then closes the file descriptors and waits for it. I don't understand the reasoning behind this, and this is blatantly wrong with dash 0.5.7: ocamlbuild closes the stderr of the job (in this particular case, ocamldep) while it is still running, and when the jobs writes to its stderr, it gets SIGPIPE-d and reported to its parent (ocamlbuild) as failed.
I think ocamlbuild should not schedule termination of a job based only on reading 0 bytes from its file descriptor. A possible fix could be to install a SIGCHLD handler to schedule finished jobs instead of doing it after reading 0 bytes. Another one (IMHO better) would be to provide non-blocking variants of Unix.close_process* functions (that would not close fds if the process is not dead), and use them in ocamlbuild. Making them actual part of the Unix module would be a valuable addition, too.
File attachments
The text was updated successfully, but these errors were encountered: