Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007638OCamlplatform support (windows, cross-compilation, etc)public2017-09-25 17:422017-09-28 11:44
Reporterxleroy 
Assigned To 
PrioritynormalSeveritymajorReproducibilityalways
StatusresolvedResolutionfixed 
PlatformMingw64OSWindows 64OS Version10
Product Version4.06.0 +dev/beta1/beta2/rc1 
Target Version4.06.0 +dev/beta1/beta2/rc1Fixed in Version4.06.0 +dev/beta1/beta2/rc1 
Summary0007638: Mingw64 + threads + system exception raised through longjmp() = crash
DescriptionConsider:

let crashme v =
  ignore (Sys.getenv v)

let _ =
  let th = Thread.create crashme "no such variable" in
  Thread.join th

Compile this program to bytecode using the Mingw64 port of OCaml and the trunk current at the time of this PR, i.e. 4.06.0+dev. On a Windows 10 machine (ocaml-mingw-64-b from Inria's CI pool, to be exact), the program crashes reproducibly.

Running it under a debugger shows a segfault in the call to longjmp() from caml_raise(), corresponding to Sys.getenv raising Not_found.

A similar issue shows up with lib-threads/socketsbuf.ml from the OCaml test suite.

The program works fine when compiled to native code.

This might be an instance of the setjmp/longmp problem reported here: https://sourceforge.net/p/mingw-w64/bugs/406/ [^]

Indeed, the problem goes away if, as suggested in the problem report above, the bytecode interpreter is modified to use __builtin_setjmp and __builtin_longjmp instead of setjmp/longjmp. Note however that those GCC builtins are undocumented.
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0018340)
frisch (developer)
2017-09-26 13:39

Xavier: did you try other ports (especially msvc64) and/or older versions of OCaml (4.05)?
(0018342)
xleroy (administrator)
2017-09-26 14:19

> did you try other ports (especially msvc64) and/or older versions of OCaml (4.05)?

Not yet. A git bisect is in progress. More data points are always welcome.
(0018351)
frisch (developer)
2017-09-27 10:06
edited on: 2017-09-27 10:25

My current data points:

  - 4.06, mingw64 port: FAIL as reported, with or without -custom (segfault after 13-15s). OK with ocamlopt (quick termination with "Thread 1 killed on uncaught exception Not_found").

  - 4.06, msvc64 port (using VS2015): OK.

  - 4.06, mingw port (32-bit): OK.

  - 4.05, mingw64 port: FAIL. (So: not a recent regression, in particular not related to the new Unicode stuff.)


Also, replacing the call to `Sys.getenv` with a direct `raise Not_found` does not trigger the problem.

(0018363)
xleroy (administrator)
2017-09-27 16:17

Thanks a lot Alain for the data points.

This is consistent with the hypothesis that Mingw64 has a bug in the way it uses setjmp/longjmp from Microsoft's CRT.

(That MSVC has no problems can be explained in several way: use of a different CRT, use of the same CRT but in a different manner, special compilation of setjmp and longjmp, etc.)
(0018364)
xleroy (administrator)
2017-09-27 16:23

That leaves us with the question of finding a workaround.

Ideally, Mingw64 would fix the issue and we wouldn't have anything to do, but I'm afraid this will take time, and in the meantime the Mingw64 port of OCaml is seriously broken.

__builtin_setjmp / __builtin_longjmp could be used if we don't mind the fact that these are undocumented GCC features intended to help with the implementation of setjmp / longjmp, if I understood correctly.

I thought of using "frame-based structured exception handling" (the C++-style exception mechanism that Microsoft added to C) as a replacement for this particular use of setjmp / longjmp in the bytecode runtime system. It would do the job, but it is not implemented by the Mingw64 compiler, only by the MSVC compiler.
(0018366)
frisch (developer)
2017-09-27 16:38

Also related: http://www.agardner.me/golang/windows/cgo/64-bit/setjmp/longjmp/2016/02/29/go-windows-setjmp-x86.html [^]

It seems several people have experienced unexplained problems with setjmp/longjmp under mingw64 and decided to switch to using __builtin_* variants, which was successful. So I'd say: without more information or ideas for another workaround, let's follow the crowd. The Windows' mysterious ways.

(Oh, and sourceforge.net is in "static offline mode" now...)
(0018373)
xleroy (administrator)
2017-09-27 17:33

Pull request at https://github.com/ocaml/ocaml/pull/1376 [^]

- Issue History
Date Modified Username Field Change
2017-09-25 17:42 xleroy New Issue
2017-09-25 17:45 xleroy Description Updated View Revisions
2017-09-26 13:39 frisch Note Added: 0018340
2017-09-26 14:19 xleroy Note Added: 0018342
2017-09-27 10:06 frisch Note Added: 0018351
2017-09-27 10:11 frisch Note Edited: 0018351 View Revisions
2017-09-27 10:14 frisch Note Edited: 0018351 View Revisions
2017-09-27 10:14 frisch Note Edited: 0018351 View Revisions
2017-09-27 10:19 frisch Note Edited: 0018351 View Revisions
2017-09-27 10:25 frisch Note Edited: 0018351 View Revisions
2017-09-27 16:17 xleroy Note Added: 0018363
2017-09-27 16:17 xleroy Status new => confirmed
2017-09-27 16:23 xleroy Note Added: 0018364
2017-09-27 16:38 frisch Note Added: 0018366
2017-09-27 17:33 xleroy Note Added: 0018373
2017-09-28 11:44 xleroy Status confirmed => resolved
2017-09-28 11:44 xleroy Resolution open => fixed
2017-09-28 11:44 xleroy Fixed in Version => 4.06.0 +dev/beta1/beta2/rc1
2017-09-28 11:44 xleroy Target Version => 4.06.0 +dev/beta1/beta2/rc1


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker