Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSAStartup/WSACleanup high latency on Windows #4750

Closed
vicuna opened this issue Mar 18, 2009 · 22 comments
Closed

WSAStartup/WSACleanup high latency on Windows #4750

vicuna opened this issue Mar 18, 2009 · 22 comments
Labels

Comments

@vicuna
Copy link

vicuna commented Mar 18, 2009

Original bug ID: 4750
Reporter: gildor
Status: closed (set by @xavierleroy on 2009-12-07T10:42:27Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 3.11.0
Fixed in version: 3.11.2+dev
Category: ~DO NOT USE (was: OCaml general)
Monitored by: db @alainfrisch

Bug description

It happens that WSACleanup called from otherlibs/win32unix/startup.c can take 3 seconds to end.

For simple program, like ocamlfind that use the Unix module, it makes them take at least 3 seconds to run. For this kind of program, don't calling WSAStartup/Cleanup is a huge performance gain.

I checked this call in other PL (python), and it get initialized only when needed (i.e. when you use the network). I know this is harder to implement for OCaml but I think it is worth doing it.

My first idea is to just call it when one of the network primitivie is used...

Any comment ?

(I can volunteer providing you a patch, if you are willing to integrate it)

Regards
Sylvain Le Gall

@vicuna
Copy link
Author

vicuna commented Mar 28, 2009

Comment author: @xavierleroy

A patch would be very welcome if you're willing to try and fix this issue yourself.

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: db

I have never observed such behavior. Maybe something broken/incorrectly configured in your system, such as the default route, DNS servers, etc.?

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: gildor

The network card is either a WIFI card or a wired NIC. Both was configured directly by IBM on my laptop. Network is correctly working in all case. If there was a problem of default route/DNS server, I hope that it will affect more things than just WSACleanup...

Anyway, I think this behavior is more or less visible depending on various NIC driver. Maybe WIFI cards generate higher latency than wired NIC.

My point of view on this issue is that if the network is not needed, there is no point starting it. That is exactly the same thing for Python which only start network when required.

I am willing to provide a patch but it will require to start network only when you create a socket on windows. It will also require to use a global variable to check when it is necessary to shutdown the network. I am not sure that is very good programming style.

If "db" is willing to test my problem, here is a classical use case:

  • pickup a project that use "ocamlfind" and compile a lot of thing with it
  • time the compilation of the project
  • comment WSACleanup/WSAStartup in otherlibs/win32unix/startup.c (add a fprintf(stderr,...)/flush(stderr) to be sure to use modified version)
  • compile just win32unix and install it
  • time again the compilation of the project

On my computer, there is a huge difference between the two compilation times. "ocamlfind" use Unix module which trigger the WSAStartup but ocamlfind doesn't use any network related call. I have the same kind of latency with "unison" which doesn't use network directly.

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: db

My own project is compiled with win32unix. And I see no delay on shutdown (on many different computers with different NICs).

If you have a complete test case I will try to compile and run it on my hardware and then tell you results.

Best regards,
Dmitry Bely

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: gildor

A very simple test case : is to time the command "ocamlfind list".

On a Linux system i got:
real 0m0.008s
user 0m0.004s
sys 0m0.004s

with a lot of packages

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: gildor

But I fully agree that it should depends on the NIC and its driver (unfortunately you cannot do a lot of things regarding quality of the driver)

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: gildor

By the way, I am running Windows XP Pro SP3.

NIC are:

  • Intel PRO/1000
  • Intel PRO/Wireless 3945ABG

I have deactivated the IEEE1394 (virtual) network card.

I am using FSecure AV 2009.

Maybe other configuration doesn't suffer the same kind of problem.

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: db

Here is my times for "ocamlfind list":

57866 14:11:19,4627916 ocamlfind.exe 3780 Process Start SUCCESS Parent PID: 3684
104329 14:11:54,5796707 ocamlfind.exe 1500 Process Exit SUCCESS Exit Status: 0, User Time: 0.0156250, Kernel Time: 0.2187500

NIC: Intel(R) PRO/1000 MT
Driver: Intel, ver. 8.4.2.1

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: db

Sorry, copied wrong lines from the log. Correct results:

21161 14:28:36,2069422 ocamlfind.exe 3932 Process Start SUCCESS Parent PID: 3532
49001 14:28:37,5040897 ocamlfind.exe 3932 Process Exit SUCCESS Exit Status: 0, User Time: 0.0156250, Kernel Time: 0.2187500

another run:

143515 14:30:46,6545333 ocamlfind.exe 3176 Process Start SUCCESS Parent PID: 352
161944 14:30:47,6194999 ocamlfind.exe 3176 Process Exit SUCCESS Exit Status: 0, User Time: 0.0312500, Kernel Time: 0.2500000

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: gildor

Kernel time is quite high for a simple command. I mean user time is OK and near the Linux value but kernel time is almost 10 times bigger.

All in all, the command seems to take 1 second. This is very long for a simple "ocamlfind list".

Maybe I am not reading well the output of your time command.

On my Windows system with WSACleanup/Startup removed:
real: 0.157
user: 0.031
sys: 0.046

(Thinkpad X60s, Core 2 Duo on battery power).

With the modification, the command takes almost 6 time less. We have however the same user time...

More information:
NIC: Intel(R) PRO/1000 PL
Driver: Intel 9.6.31.1 (12/01/2007)

NIC: Intel PRO/Wireless 3945ABG
Driver: Intel 10.5.1.68 (27/09/2006)

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: db

How did you measure your times? I did it with Process Monitor:

http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx

Please show Process Monitor times with and without WSAStartup/WSACleanup.

P.S. My PC is slower than yours: Pentium D 3GHz desktop, 3Gb RAM, Windows XP SP3

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: db

With win32unix/startup.c modified as

/* (void) WSAStartup(MAKEWORD(2, 0), &wsaData); /
(...)
/
(void) WSACleanup(); /
fprintf(stderr, "/
(void) WSACleanup(); */");
flush(stderr);

"ocamlfind list" hangs just after printing. I tried findlib 1.2.4 from http://projects.camlcity.org/projects/findlib.html. Are you sure that it does not use any network functions?

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: gildor

AFAIK, there is no reason to use network related function in findlib.

What about using "fflush(stderr)" (note the double f). I am too used to OCaml-flush function.

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: db

Indeed. Here is the updated timing:

67613 20:12:47,2100794 ocamlfind.exe 2952 Process Start SUCCESS Parent PID: 600
106511 20:12:48,9925663 ocamlfind.exe 2952 Process Exit SUCCESS Exit Status: 0, User Time: 0.0468750, Kernel Time: 0.1718750

I.e. kernel time is 0.21 -> 0.17. I don't think it's anything near "six time less". As for >1 sec running time, it's just Process Monitor overhead. The real time is User Time+Kernel Time.

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: db

The bottom line: I think you should find the reason for such slow WSACleanup() in your system first. Anyway, most modern software uses network and that 3-sec delay on shutdown is hardly acceptable. Maybe it's caused by FSecure, maybe by buggy NIC driver, maybe something else. And only when it turns out that work around someone else's bug is the only choice, fixing Ocaml library would make sense. IMHO before that we should follow the principle "if it ain't broke don't fix it".

@vicuna
Copy link
Author

vicuna commented Apr 6, 2009

Comment author: gildor

Don't you think that spending 0.171s for kernel time is a bit big compared to 0.046 for user time ?

On my "buggy" system, kernel time is only 0.046 (windows without WSAStartup) or 0.04 (Linux) and in both case user time and kernel time are close.

On your system kernel time is 4 time bigger, even without calling WSAStartup... If the machine is slower the ratio user time/kernel time should still be comparable...

Anyway, the problem is that it is not so easy to consider if it is a problem or not, if this is due to my configuration or not SINCE it doesn't fail and it is working in most case. So you are using a killing argument for this bug: "it is just a performance loss, who cares". I cannot answered you because you are right but consider the fact that 0.21 -> 0.17 is a 20% performance win (approx. 10% if you also take user time into account). If you consider, that a 10-20% win doesn't interest you, I am pretty sure that I cannot make you change your mind...

For now, I am busy working on Solaris 10 with OCaml but I will do further test on different computer (XP Pro/Vista et al) when I will have time.

@vicuna
Copy link
Author

vicuna commented Apr 28, 2009

Comment author: gildor

I have done extra work on this bug, to track it more precisely.

I have benchmarked an application with and without unix.cmxa (a do nothing application in fact). There is a factor 10 in term of time between them.

However, the use of WSAStartup/WSACleanup is not the real problem. If Unix don't initialize the network stack, it just makes fail another function that take time.

So I have search in unix.ml what cause the slowdown. To my suprise, this is not a directly network related function but stdin/stdout/stderr initialization using win_alloc_handle_or_socket. More precisely, this is the code:

if (getsockopt((SOCKET) h, SOL_SOCKET, SO_TYPE, (char *)&opt, &optlen) == 0)
Descr_kind_val(res) = KIND_SOCKET;

If this code is commented, there is a great increase in speed.

db: do you think this slow call is due to some misconfiguration on my computer ?

@vicuna
Copy link
Author

vicuna commented Apr 28, 2009

Comment author: gildor

OK, new solution I can propose. Replace
if (getsockopt((SOCKET) h, SOL_SOCKET, SO_TYPE, (char *)&opt, &optlen) == 0)

by

if (GetFileType(h) == FILE_TYPE_UNKNOWN && GetLastError() == NO_ERROR)

The drop in term of performance is really smaller and I think it covers most case...

@vicuna
Copy link
Author

vicuna commented Apr 30, 2009

Comment author: db

WSAStartup() should always be called before getsockopt(). And getsockopt() is directly related to the network (is the part of winsock library). So yes, I still think that your computer is misconfigured :-) Did you try to disable IPV6?

As for your solution - it looks strange. For sockets GetFileType() returns FILE_TYPE_PIPE, not FILE_TYPE_UNKNOWN ( http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx ). So here you probably cannot go without some winsock function that requires WSAStartup()...

@vicuna
Copy link
Author

vicuna commented Apr 30, 2009

Comment author: gildor

Sorry for not being clear, I re-enable WSAStartup and WSACleanup, the "getsockopt" change I propose is the only change for the whole OCaml source.

I have removed IPV6 and disabled Antivirus. I even have disabled all network interfaces and then re-enabled it. Always the same problem.

So, when getsockopt is called WSAStartup has already been called.

I have misread GetFileType documentation, so this is still not the best solution. But this is really where the slowdown comes from.

@vicuna
Copy link
Author

vicuna commented Jul 19, 2009

Comment author: gildor

I do another round to find the smallest possible, less invasive, correction.

Here is a whole new single line correction:

in otherlibs/win32unix/channels.c in function win_handle_fd replace

value res = win_alloc_handle_or_socket((HANDLE) _get_osfhandle(crt_fd));

by

value res = win_alloc_handle((HANDLE) _get_osfhandle(crt_fd));

The function win_handle_fd is not exported and is only used in unix.ml to convert file descriptor 0, 1 and 2. I don't see any reason why this 3 descriptor could be sockets.

This is the only change required to have a fast Unix module, in my case.

@vicuna
Copy link
Author

vicuna commented Dec 7, 2009

Comment author: @xavierleroy

I like the last proposed fix, in particular because win_alloc_handle_or_socket() now becomes useless. (I agree that standard descriptors 0, 1, 2 should never be sockets.) Fix applied in 3.11 branch, will go in 3.11.2.

@vicuna vicuna closed this as completed Dec 7, 2009
@vicuna vicuna added the bug label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant