|Anonymous | Login | Signup for a new account||2013-05-22 17:49 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0005327||OCaml||OCaml general||public||2011-08-03 15:11||2012-01-14 10:41|
|Target Version||Fixed in Version||3.13.0+dev|
|Summary||0005327: Unix.select blocks under Windows if same socket listed in first and third arguments|
|Description||If the same socket is listed in both read and error (first and third lists) for Unix.select in the Windows port then Unix.select never returns even if there is normal data to be read from the socket (see attached demonstration)|
|Additional Information||The attached file can be built with ocamlopt -o select.exe unix.cmxa select.ml and then simply run. If you raw telnet to localhost:400 and type any line of text, the program should simply echo it to the console and then terminate.|
Under Windows, the Unix.select call never returns. If you change the second [socket] to  on line 12 then it works as expected. Code works under linux without change. It also works with the older version of otherlibs/win32unix/select.c
Previous testing with a larger file showed that if there's another socket in the read list which has data then the complete list of sockets returned by Unix.select is correct (i.e. it *does* return the sockets which were included in the error list, if they have data to be read - it's just that for some reason those sockets on their own don't cause Unix.select to return).
The original scenario for the loop had all client sockets in both the first and third lists but also the server socket in the first list - it meant that the server process only found out about pending data in the read socket when a new client connected!
|Tags||No tags attached.|
|Attached Files|| select.ml [^] (617 bytes) 2011-08-03 15:11|
ocaml-3.12.0-PR5327.patch [^] (7,921 bytes) 2011-12-22 16:58 [Show Content]
I have a hunch that this problem would go away if any of PR#5325 and PR#5329 is solved. Let's continue the discussion in these two PRs.
|I think fixing 5329 will simply mask the presence of this bug, given that it doesn't seem to be related to 5325|
Well, the diagnosis was considerably easier than the fix! But it was worth investigating this a bit further rather than letting PR 5329 blur the issue...
The problem is that WSAEventSelect only allows *one* event to be associated with a given socket (explicitly described in MSDN halfway through remarks - http://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx [^]). The code given calls WSAEventSelect individually for both the readfds and exceptfds - the second call overrides the first meaning that the reason select is blocking is because it's in effect only looking for out-of-band data (this can be absolutely demonstrated by changing unix_select to process readfds *after* exceptfds)
The attached patch fixes the problem by hacking the way the worker is allocated - it scans each individual worker to see if the socket itself is already in the lists, rather than just finding a sockets worker. This could be done much more efficiently (by taking advantage of the maps generated to determine if the socket needs to be searched for in unix_select). This minimal patch fixes the problem. I'm not particularly interested in optimising it further as PR 5329 will, for the common case, short circuit this anyway. My impression is that this implementation of select is more about compatibility than speed!
Armed with this patch, the attached ML example now runs as it should - it would benefit from more testing (once I submit a patch to PR 5329, none of my own will walk this path...)
The patch also contains a minor correction to windbug.h - the __VA_ARGS__ handling was incorrect for GCC when no arguments are given which prevented compilation in debugging mode. I haven't tested the debugging mode on MSVC, but according to comments on http://msdn.microsoft.com/en-us/library/ms177415(v=vs.80).aspx [^] the MSVC compiler recognises GCC's ## preprocessor operator correctly (and in fact has a hack so that it wouldn't need it anyway).
That's a nontrivial fix indeed! Thanks for investigating. I'm willing to apply the patch, but before that let me give Sylvain Le Gall (a.k.a. gildor and the original author of this code) an opportunity to look at it and comment.
|Patch applied in SVN trunk.|
|2011-08-03 15:11||dra||New Issue|
|2011-08-03 15:11||dra||File Added: select.ml|
|2011-12-21 12:08||xleroy||Relationship added||related to 0005325|
|2011-12-21 12:09||xleroy||Relationship added||related to 0005329|
|2011-12-21 12:16||xleroy||Note Added: 0006438|
|2011-12-21 12:16||xleroy||Status||new => feedback|
|2011-12-21 18:18||dra||Note Added: 0006471|
|2011-12-21 18:18||dra||Status||feedback => new|
|2011-12-22 16:58||dra||Note Added: 0006507|
|2011-12-22 16:58||dra||File Added: ocaml-3.12.0-PR5327.patch|
|2011-12-28 11:45||xleroy||Note Added: 0006548|
|2011-12-28 11:45||xleroy||Status||new => feedback|
|2012-01-14 10:41||xleroy||Note Added: 0006674|
|2012-01-14 10:41||xleroy||Status||feedback => resolved|
|2012-01-14 10:41||xleroy||Resolution||open => fixed|
|2012-01-14 10:41||xleroy||Fixed in Version||=> 3.13.0+dev|
|Copyright © 2000 - 2011 MantisBT Group|