New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fd leak in Unix.ml? #5421
Comments
Comment author: @protz Hi, So if I understand correctly, clone is failing because there's no memory left on the device. That's a pretty bad situation already. Can you suggest a way and/or a patch to handle this gracefully? Thanks, jonathan |
Comment author: hw From 'man 2 clone': ENOMEM Cannot allocate sufficient memory to allocate a task structure for the child, or to copy those parts of the caller’s context that need to be copied. I'm not sure how clone() is implemented, but I've consulted a Unix veteran and he says the call can attempt to allocate up to the same amount of memory as the parent process. In my case the process has 645 MB resident, and there is ~ 64 MB unused on the machine. So the situation is bad but not terrible ;-) In my uninformed opinion a failure of clone() doesn't necessarily indicate your process is shafted beyond repair, so it would be nice if the error was recoverable. So the fd leak. I'm not an expert in Unix programming and I understand forking is a particularly tricky matter. There are probably conventions in the standard library, but my intuition is that each time an fd is allocated it should be wrapped in a try..finally that closes the fd should anything fail. NB pipe() can also fail (e.g. with EMFILE/ENFILE). Unless of course I'm missing some other mechanism that is cleaning up these fds (but not in the case of my program). |
Comment author: till Fork/Exec is indeed very tricky and should be done in C (you just need a level of low-level control on the forked side that ocaml cannot provide). |
Comment author: @protz Hi Till, You patch is pretty involved and it's going to require a lot of work to make this happen (if we want to, which I'm in no position to declare :)). I suggest we try to fix this issue in a simpler way. Hw: you wrote « my intuition is that each time an fd is allocated it should be wrapped in a try..finally that closes the fd should anything fail ». Could you possibly write a patch to the Unix module that fixes this? You'd be in a good position to make sure the patch is correct (I could write but I don't have your setup and I'm unsure how to trigger ENOMEM without bringing my work machine down to its knees :)). I think it's a matter of simply wrapping the call to open_proc_full in a try..with block, cleaning up the file descriptors, and then re-raising the exception if there was a failure. There may be other places in the code of the Unix module that want similar cleanups. Cheers, jonathan |
Comment author: till I've uploaded a patch that should take care of the fd leak. It is also careful to handle the cases where pipe fails (e.g. running out of available fds.) |
Comment author: @protz The patch looks fine. The only concern I have is that you're sometimes using a mix of close_in, close_out, close. In some places (but not all) where you could use close_in or close_out, you chose to use close, which indeed makes things easier. Reading the code, it turns close_in does close the underlying fd, but is there any strong reason why you're not using close everywhere? Apart from that, no concerns :). Thanks, jonathan |
Comment author: @protz After chatting with Damien, what you're doing (closing the descr, not the channel, as in open_process_full) is definitely safe. |
Comment author: @protz Fixed in svn r12038 |
Comment author: hw Nice one, thanks folks. |
Original bug ID: 5421
Reporter: hw
Assigned to: @protz
Status: closed (set by @xavierleroy on 2013-08-31T10:44:25Z)
Resolution: fixed
Severity: major
Platform: x86_64
OS: CentOS
OS Version: 5.5
Version: 3.12.1
Fixed in version: 3.13.0+dev
Category: ~DO NOT USE (was: OCaml general)
Monitored by: hw @protz
Bug description
I have an OCaml process that is leaking fds in multiples of 6. In the logs I have:
Unix.Unix_error(ENOMEM, "fork", "")
Raised by primitive operation at file "unix.ml", line 865, characters 8-14
Called from file "unix.ml", line 882, characters 2-141
[...]
Looking at otherlibs/unix/unix.ml in the source distribution, this corresponds to a call to open_process_full (line 882).
let open_process_full cmd env =
let (in_read, in_write) = pipe() in
let (out_read, out_write) = pipe() in
let (err_read, err_write) = pipe() in
let inchan = in_channel_of_descr in_read in
let outchan = out_channel_of_descr out_write in
let errchan = in_channel_of_descr err_read in
open_proc_full cmd env (Process_full(inchan, outchan, errchan))
out_read in_write err_write [in_read; out_write; err_read];
close out_read;
close in_write;
close err_write;
(inchan, outchan, errchan)
open_process_full in turn calls open_proc_full, which calls fork() (line 865). Matching up the timestamps with an strace:
19188 12:39:29 pipe([223, 224]) = 0
19188 12:39:29 pipe([225, 226]) = 0
19188 12:39:29 pipe([227, 228]) = 0
19188 12:39:29 futex(0x9b0174, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9b0170, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
19188 12:39:29 <... futex resumed> ) = 1
19188 12:39:29 lseek(223, 0, SEEK_CUR <unfinished ...>
19188 12:39:29 <... lseek resumed> ) = -1 ESPIPE (Illegal seek)
19188 12:39:29 futex(0x9b0174, FUTEX_WAIT_PRIVATE, 116925, NULL <unfinished ...>
19188 12:39:29 <... futex resumed> ) = 0
19188 12:39:29 futex(0x9b0140, FUTEX_WAKE_PRIVATE, 1) = 0
19188 12:39:29 lseek(226, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
19188 12:39:29 lseek(227, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
19188 12:39:29 fcntl(223, F_GETFD) = 0
19188 12:39:29 fcntl(223, F_SETFD, FD_CLOEXEC) = 0
19188 12:39:29 fcntl(226, F_GETFD) = 0
19188 12:39:29 fcntl(226, F_SETFD, FD_CLOEXEC) = 0
19188 12:39:29 fcntl(227, F_GETFD) = 0
19188 12:39:29 fcntl(227, F_SETFD, FD_CLOEXEC) = 0
19188 12:39:29 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x447169d0) = -1 ENOMEM (Cannot allocate memory)
clone() is failing in one of the dozen or so ways documented in 'man 2 clone'. This exception propagates out of open_process_full, leaving the six fds from the three pipe() calls hanging around. As far as I can tell these fds are not cleaned up by any garbage collection process.
fds 223-228 are still open a few hours later. Soon the process will hit ulimit -n.
File attachments
The text was updated successfully, but these errors were encountered: