Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questionable reasoning in job control code #5371

Closed
vicuna opened this issue Oct 5, 2011 · 6 comments
Closed

questionable reasoning in job control code #5371

vicuna opened this issue Oct 5, 2011 · 6 comments

Comments

@vicuna
Copy link

vicuna commented Oct 5, 2011

Original bug ID: 5371
Reporter: @glondu
Status: resolved (set by @damiendoligez on 2017-02-27T12:57:09Z)
Resolution: suspended
Priority: normal
Severity: minor
Version: 3.12.0
Target version: later
Category: -for ocamlbuild use https://github.com/ocaml/ocamlbuild/issues
Tags: patch
Monitored by: mehdi @edwintorok

Bug description

ocamlbuild recently started failing with dash 0.5.7, which is /bin/sh on Debian, when the ocamldep command writes to stderr. This was triggered by a patch that has been reverted in Debian (see [1]).

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642922

However, after looking more closely at what was going on inside dash and ocamlbuild, I believe ocamlbuild is genuinely faulty: in ocamlbuild_executor.ml, around line 205, it assumes that when reading from a file descriptor that has been returned as ready for reading by select returns 0 bytes, the corresponding job is finished, and it then closes the file descriptors and waits for it. I don't understand the reasoning behind this, and this is blatantly wrong with dash 0.5.7: ocamlbuild closes the stderr of the job (in this particular case, ocamldep) while it is still running, and when the jobs writes to its stderr, it gets SIGPIPE-d and reported to its parent (ocamlbuild) as failed.

I think ocamlbuild should not schedule termination of a job based only on reading 0 bytes from its file descriptor. A possible fix could be to install a SIGCHLD handler to schedule finished jobs instead of doing it after reading 0 bytes. Another one (IMHO better) would be to provide non-blocking variants of Unix.close_process* functions (that would not close fds if the process is not dead), and use them in ocamlbuild. Making them actual part of the Unix module would be a valuable addition, too.

File attachments

@vicuna
Copy link
Author

vicuna commented Oct 6, 2011

Comment author: @edwintorok

The read doesn't really return 0, but ocambuild_executor.ml maps all Unix errors as if read returned 0, and from there glondu's analysis stands.
In this case it gets an EAGAIN (because subprocess is still alive and hasn't written to stderr anything yet).

Here is a proposed patch:
http://caml.inria.fr/mantis/file_download.php?file_id=503&type=bug

It would be nice if EINTR was handled some way too though.

Thanks to Erkki Seppälä/flux on #ocaml for the idea on setting back to blocking mode.

@vicuna
Copy link
Author

vicuna commented Oct 6, 2011

Comment author: @glondu

I think this is not enough.

Actually, all ocamldep jobs start by closing their standard output (since it is redirected), which triggers job termination, which consists in waiting for the remaining output of the job, i.e. the whole job in case of those that redirect their standard output. It theoretically means that ocamldep jobs are not parallelized, and it seems to be what is happening in practice.

I retract my initial suggestion about using a non-blocking close_process_full as a better way to fix that: a job would (theoretically) stay "active" as long as there is no output from other jobs.

A neater, non-intrusive, approach could be to use signalfd... but this is Linux-specific :-( A portable way could be to schedule job termination from a SIGCHLD handler. But {open,close}process_full seem to be a bad interface for that since they don't expose the PID of the child process, and the logic implemented in close is not the one we want.

@vicuna
Copy link
Author

vicuna commented Jul 26, 2013

Comment author: meyer

This should be retriaged ...

@vicuna
Copy link
Author

vicuna commented May 12, 2014

Comment author: @edwintorok

FWIW ocamlbuild returns with 'signal -8' for commands that print warnings/errors to stderr (like ocamlfind, menhir).
I'm not sure if this is the same bug, but seems related, see:
http://www.freebsd.org/cgi/query-pr.cgi?pr=189710
ocaml/merlin#193 (comment)

@vicuna
Copy link
Author

vicuna commented May 18, 2014

Comment author: @gasche

Thanks to an SSH access to a FreeBSD box given by Edwin, I could reproduce the issue and verify that the patch above (submitted again... three years ago) makes ocamlbuild work again on FreeBSD. I included the patch in 4.02 and trunk.

The patch only fixes the symptoms, though: as I understand it the issue is not yet solved. I'm leaving the PR open for this reason.

@vicuna
Copy link
Author

vicuna commented Feb 27, 2017

Comment author: @damiendoligez

ocamlbuild is now a separate project that lives on GitHub.
PR transferred to ocaml/ocamlbuild#164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant