Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005325OCamlOCaml generalpublic2011-08-02 22:502014-07-17 16:15
Reporterdra 
Assigned To 
PrioritynormalSeveritymajorReproducibilityalways
StatusconfirmedResolutionfixed 
PlatformOSOS Version
Product Version3.12.0 
Target Version4.02.1+devFixed in Version 
Summary0005325: Blocked Unix.recv in one thread blocks Unix.send in another thread under Windows
DescriptionIt appears not be possible to write to a socket in one thread if another thread is blocked in a [recv] call on the same socket (and presumably vice versa).

This behaviour only seems to affect Windows - tried on both Windows 7 with 3.12.0 and Windows XP with 3.10.1 with the same effect.
Additional InformationThe attached file should be compiled with ocamlopt -o foo.exe -thread unix.cmxa threads.cmxa Foo.ml

It can then be invoked with, say:

foo www.google.com

and the line

GET / HTTP/1.0

followed by enter twice. On Linux, you'll see an HTTP redirect response. On Windows, the program will hang after the first carriage return eventually crashing with an exception when Google (or whoever) closes the socket at the other end on a timeout.

The equivalent code in C definitely works, so it appears to be a problem with Unix module / OCaml runtime.
Tagspatch
Attached Files? file icon Foo.ml [^] (1,723 bytes) 2011-08-02 22:50 [Show Content]
c file icon foo.c [^] (1,680 bytes) 2011-12-21 10:21 [Show Content]
patch file icon patch-delete.patch [^] (2,483 bytes) 2011-12-22 09:35 [Show Content]
patch file icon patch-invert.patch [^] (2,182 bytes) 2011-12-22 09:36 [Show Content]

- Relationships
related to 0005327closed Unix.select blocks under Windows if same socket listed in first and third arguments 
related to 0005578resolved Windows: Exception raised when reading from a socket 

-  Notes
(0006410)
xleroy (administrator)
2011-12-20 13:40

I checked the Win32 implementation of send and recv, and the master lock is properly released, allowing both system calls to run in parallel. So, I suspect the mutual exclusion occurs in the Win32/Winsock system calls themselves. Concerning your comparison with C, did you keep in mind that OCaml creates sockets in blocking mode, which may not be the case in your C program? That could at least explain the difference.
(0006427)
dra (reporter)
2011-12-21 10:36

You're correct - I hadn't seen the setsockopt calls in select.c and accept.c. Attached is my hacked up C equivalent code (compiled with gcc -mno-cygwin -o foo.exe foo.c -lws2_32)

According to MSDN (http://msdn.microsoft.com/en-us/library/windows/desktop/ms740532(v=vs.85).aspx [^]), SO_OPENTYPE is deprecated and WSASocket should be used instead with WSA_FLAG_OVERLAPPED not given in dwFlags. If you alter the call to WSASocket in my example to have 0 for the last parameter then the C code exhibits the same problem as the OCaml code - so it's Winsock, not OCaml specifically.

But - why is this Microsoft specific socket option being set in the first place? As far as I can see, all you're doing is creating sockets which can't use overlapping operations (which aren't used anyway) - can't the call simply be removed? Note that overlapped I/O and blocking/non-blocking in Microsoft parlance are *unrelated* concepts: see first para of http://msdn.microsoft.com/en-us/library/windows/desktop/ms740087(v=VS.85).aspx [^] - ioctlsocket still controls blocking or non-blocking settings.
(0006439)
xleroy (administrator)
2011-12-21 12:21

The "SO_OPENTYPE = SO_SYNCHRONOUS_NONALERT" hack probably goes back to the early days of the win32unix library, i.e. Windows NT 4 / Windows 98 (or maybe even 3.5 / 95). At that time, I think it was necessary so that sockets would be created in blocking mode. But many things have changed since then. We'd gladly consider a patch to modernize this aspect of win32unix (and maybe others too), provided you give it some testing on your applications first.
(0006440)
dra (reporter)
2011-12-21 12:41

I'm just putting a Windows 7 virtual PC together for some serious testing ... I agree that I think all these bugs are related to this. Will get back to you later today!
(0006470)
dra (reporter)
2011-12-21 18:14

OK - I've tested two possible ways of fixing it.

The first is simply to remove lines 31-41 & 49 in otherlibs/win32unix/select.c and 29, 34-42 & 48-52 in otherlibs/win32unix/accept.c

The second is to invert the logic of the getsockopt block instead - i.e. ensure that SO_OPENTYPE is zero rather than non-zero before calling socket (or accept)

I don't know which you'd prefer - the first option (deleting the problem code) was my instinct; the second one guards against another linked in C library messing around with Winsock options (I don't know whether this is something you'd want to care about in general or not, though).

Happy to provide a patch for either, if necessary.

Note that neither option fixes PR 5327, surprisingly.
(0006484)
dra (reporter)
2011-12-22 09:36

Patches against trunk for both options attached...
(0006547)
xleroy (administrator)
2011-12-28 11:40

Thanks for the patches. For the sake of simplicity, I go with the "delete" patch, which is now applied in SVN trunk, commit 11966.
(0007455)
xleroy (administrator)
2012-05-24 19:11

Because we lack time to fix PR#5578 in time for release 4.00, I reverted the patch on branch version/4.00 but left it on trunk.
(0009403)
doligez (administrator)
2013-06-04 17:52

Note that this was inadvertently reverted also in trunk when merging the 4.00.1->4.00.2+dev changes.

re-patched in trunk/4.01 (rev 13743) but then we still need to address PR#5578
(0009659)
frisch (developer)
2013-07-01 10:31

Re-opening since the fix for 0005578 required to undo the fix for 0005325.

- Issue History
Date Modified Username Field Change
2011-08-02 22:50 dra New Issue
2011-08-02 22:50 dra File Added: Foo.ml
2011-12-20 13:40 xleroy Note Added: 0006410
2011-12-20 13:40 xleroy Status new => feedback
2011-12-21 10:21 dra File Added: foo.c
2011-12-21 10:36 dra Note Added: 0006427
2011-12-21 10:36 dra Status feedback => new
2011-12-21 12:08 xleroy Relationship added related to 0005327
2011-12-21 12:21 xleroy Note Added: 0006439
2011-12-21 12:21 xleroy Status new => feedback
2011-12-21 12:41 dra Note Added: 0006440
2011-12-21 12:41 dra Status feedback => new
2011-12-21 18:14 dra Note Added: 0006470
2011-12-22 09:35 dra File Added: patch-delete.patch
2011-12-22 09:36 dra File Added: patch-invert.patch
2011-12-22 09:36 dra Note Added: 0006484
2011-12-28 11:40 xleroy Note Added: 0006547
2011-12-28 11:40 xleroy Status new => resolved
2011-12-28 11:40 xleroy Resolution open => fixed
2011-12-28 11:40 xleroy Fixed in Version => 3.13.0+dev
2012-04-07 12:32 xleroy Relationship added related to 0005578
2012-05-24 19:11 xleroy Note Added: 0007455
2012-05-24 19:11 xleroy Fixed in Version 3.13.0+dev => 4.01.0+dev
2013-06-04 17:52 doligez Note Added: 0009403
2013-07-01 10:31 frisch Note Added: 0009659
2013-07-01 10:31 frisch Assigned To => frisch
2013-07-01 10:31 frisch Status resolved => confirmed
2013-07-01 10:31 frisch Fixed in Version 4.01.0+dev =>
2013-07-01 10:32 frisch Target Version => 4.02.0+dev
2013-07-01 19:18 frisch Assigned To frisch =>
2013-07-12 18:15 doligez Target Version 4.02.0+dev => 4.01.1+dev
2013-10-09 14:13 doligez Tag Attached: patch
2014-05-25 20:20 doligez Target Version 4.01.1+dev => 4.02.0+dev
2014-07-17 16:15 doligez Target Version 4.02.0+dev => 4.02.1+dev


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker