Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005327OCamlOCaml generalpublic2011-08-03 15:112013-08-31 12:44
Reporterdra 
Assigned To 
PrioritynormalSeveritymajorReproducibilityalways
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version3.12.0 
Target VersionFixed in Version3.13.0+dev 
Summary0005327: Unix.select blocks under Windows if same socket listed in first and third arguments
DescriptionIf the same socket is listed in both read and error (first and third lists) for Unix.select in the Windows port then Unix.select never returns even if there is normal data to be read from the socket (see attached demonstration)
Additional InformationThe attached file can be built with ocamlopt -o select.exe unix.cmxa select.ml and then simply run. If you raw telnet to localhost:400 and type any line of text, the program should simply echo it to the console and then terminate.

Under Windows, the Unix.select call never returns. If you change the second [socket] to [] on line 12 then it works as expected. Code works under linux without change. It also works with the older version of otherlibs/win32unix/select.c

Previous testing with a larger file showed that if there's another socket in the read list which has data then the complete list of sockets returned by Unix.select is correct (i.e. it *does* return the sockets which were included in the error list, if they have data to be read - it's just that for some reason those sockets on their own don't cause Unix.select to return).

The original scenario for the loop had all client sockets in both the first and third lists but also the server socket in the first list - it meant that the server process only found out about pending data in the read socket when a new client connected!
TagsNo tags attached.
Attached Files? file icon select.ml [^] (617 bytes) 2011-08-03 15:11 [Show Content]
patch file icon ocaml-3.12.0-PR5327.patch [^] (7,921 bytes) 2011-12-22 16:58 [Show Content]

- Relationships
related to 0005325confirmed Blocked Unix.recv in one thread blocks Unix.send in another thread under Windows 
related to 0005329closed Include classic Unix.select in the Unix module for Win32 

-  Notes
(0006438)
xleroy (administrator)
2011-12-21 12:16

I have a hunch that this problem would go away if any of PR#5325 and PR#5329 is solved. Let's continue the discussion in these two PRs.

(0006471)
dra (reporter)
2011-12-21 18:18

I think fixing 5329 will simply mask the presence of this bug, given that it doesn't seem to be related to 5325
(0006507)
dra (reporter)
2011-12-22 16:58

Well, the diagnosis was considerably easier than the fix! But it was worth investigating this a bit further rather than letting PR 5329 blur the issue...

The problem is that WSAEventSelect only allows *one* event to be associated with a given socket (explicitly described in MSDN halfway through remarks - http://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx [^]). The code given calls WSAEventSelect individually for both the readfds and exceptfds - the second call overrides the first meaning that the reason select is blocking is because it's in effect only looking for out-of-band data (this can be absolutely demonstrated by changing unix_select to process readfds *after* exceptfds)

The attached patch fixes the problem by hacking the way the worker is allocated - it scans each individual worker to see if the socket itself is already in the lists, rather than just finding a sockets worker. This could be done much more efficiently (by taking advantage of the maps generated to determine if the socket needs to be searched for in unix_select). This minimal patch fixes the problem. I'm not particularly interested in optimising it further as PR 5329 will, for the common case, short circuit this anyway. My impression is that this implementation of select is more about compatibility than speed!

Armed with this patch, the attached ML example now runs as it should - it would benefit from more testing (once I submit a patch to PR 5329, none of my own will walk this path...)

The patch also contains a minor correction to windbug.h - the __VA_ARGS__ handling was incorrect for GCC when no arguments are given which prevented compilation in debugging mode. I haven't tested the debugging mode on MSVC, but according to comments on http://msdn.microsoft.com/en-us/library/ms177415(v=vs.80).aspx [^] the MSVC compiler recognises GCC's ## preprocessor operator correctly (and in fact has a hack so that it wouldn't need it anyway).
(0006548)
xleroy (administrator)
2011-12-28 11:45

That's a nontrivial fix indeed! Thanks for investigating. I'm willing to apply the patch, but before that let me give Sylvain Le Gall (a.k.a. gildor and the original author of this code) an opportunity to look at it and comment.
(0006674)
xleroy (administrator)
2012-01-14 10:41

Patch applied in SVN trunk.

- Issue History
Date Modified Username Field Change
2011-08-03 15:11 dra New Issue
2011-08-03 15:11 dra File Added: select.ml
2011-12-21 12:08 xleroy Relationship added related to 0005325
2011-12-21 12:09 xleroy Relationship added related to 0005329
2011-12-21 12:16 xleroy Note Added: 0006438
2011-12-21 12:16 xleroy Status new => feedback
2011-12-21 18:18 dra Note Added: 0006471
2011-12-21 18:18 dra Status feedback => new
2011-12-22 16:58 dra Note Added: 0006507
2011-12-22 16:58 dra File Added: ocaml-3.12.0-PR5327.patch
2011-12-28 11:45 xleroy Note Added: 0006548
2011-12-28 11:45 xleroy Status new => feedback
2012-01-14 10:41 xleroy Note Added: 0006674
2012-01-14 10:41 xleroy Status feedback => resolved
2012-01-14 10:41 xleroy Resolution open => fixed
2012-01-14 10:41 xleroy Fixed in Version => 3.13.0+dev
2013-08-31 12:44 xleroy Status resolved => closed


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker