New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unix.select fast path misses events if the list of fds is greater than FD_SETSIZE (typically 64) #7276
Comments
Comment author: @alainfrisch Can you confirm the following:
|
Comment author: @alainfrisch I confirm the behavior (tested with the MSVC 32-bit port), and reversing the list before the final select in the example has the expected effect. |
Comment author: @alainfrisch Fixed by commit f642817. It would be nice to include the test in the testsuite, but I'm concerned that its use of the network interface might break it. (E.g. on my machine, it triggered a Windows security popup on first use.) |
Comment author: @alainfrisch @djs55: feel free to give your real name for the Changes file if you wish. |
Comment author: djs55 Many thanks for resolving this! For the Changes file could you put my real name "David Scott"? Regarding your two statements, the Microsoft docs for select say:
I'm not enough of a Unix expert to confirm your second statement ("FD_SETSIZE gives an upper bound to fd values that can go into an fd_set") but I think that is correct. I believe Unix-like systems all use low integers as file descriptors (unlike Windows which uses opaque handles, probably pointer values). I believe select on Unix has traditionally used simple bitmaps where including fd "n" means setting bit "n" to 1. Therefore I think on Unix people are more concerned with ensuring the bitmaps are large enough to represent the highest-possible fd value (and the performance implications of frequently scanning large bitmaps). |
Original bug ID: 7276
Reporter: djs55
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2017-09-24T15:32:59Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 4.03.0
Fixed in version: 4.04.0 +dev / +beta1 / +beta2
Category: platform support (windows, cross-compilation, etc)
Related to: #5563
Monitored by: @gasche @hcarty
Bug description
Hi,
On Windows there's a problem with the Unix.select fast path used when all
the fds are sockets: the MSDN docs for select[1] say
With the default FD_SETSIZE of 64, if the sockets list is of length 65+
then later sockets will be silently ignored. The attached program demonstrates
this effect on OCaml 4.03 and 4.02.3.
A possible fix appears to be to fall back to the slow path for lists of fds greater
than FD_SETSIZE (like we do already for list of fds of varying types).
For example the following patch:
diff --git a/otherlibs/win32unix/select.c b/otherlibs/win32unix/select.c
index 0e21db8..b592e3a 100644
--- a/otherlibs/win32unix/select.c
+++ b/otherlibs/win32unix/select.c
@@ -909,9 +909,12 @@ static int fdlist_to_fdset(value fdlist, fd_set *fdset)
{
value l, c;
FD_ZERO(fdset);
for (l = fdlist; l != Val_int(0); l = Field(l, 1)) {
c = Field(l, 0);
if (Descr_kind_val(c) == KIND_SOCKET) {
DEBUG_PRINT("Non socket value encountered");
This is similar to "#5563: harden Unix.select against file descriptors
above FD_SETSIZE" except that we can still handle these larger lists using
the more generic (but more complicated) path.
[1] https://msdn.microsoft.com/en-gb/library/windows/desktop/ms740141(v=vs.85).aspx
Steps to reproduce
On Windows (I'm using the cygwin based installer from https://fdopen.github.io/opam-repository-mingw/)
ocamlfind ocamlopt -package unix -linkpkg -o test.exe test.ml
./test.exe
For me the large Unix.select will time out and the program prints "ERROR" and exits with code 1. On OSX and on Windows with the patch above applied, it prints "OK" and exits with code 0.
Additional information
I encountered this when filtering all network connections from a VM through the Mirage TCP/IP stack and out into the Internet via regular sockets. Occasionally the I/O would stop (but other Lwt timer threads continue), and then after some timeout it would wake up again. I believe this is because the list of active socket connections was > 64 and all the activity was on the fds at the end of the list.
File attachments
The text was updated successfully, but these errors were encountered: