Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocamlrun built with Intel compilers segfaults #7000

Closed
vicuna opened this issue Sep 29, 2015 · 12 comments
Closed

ocamlrun built with Intel compilers segfaults #7000

vicuna opened this issue Sep 29, 2015 · 12 comments

Comments

@vicuna
Copy link

vicuna commented Sep 29, 2015

Original bug ID: 7000
Reporter: kehoste
Assigned to: @lefessan
Status: closed (set by @lefessan on 2017-02-27T20:56:37Z)
Resolution: unable to duplicate
Priority: normal
Severity: major
Platform: Linux
OS: Scientific Linux 6
OS Version: 6.7
Version: 4.02.3
Target version: later
Category: runtime system and C interface
Related to: #3917
Monitored by: @jmeber

Bug description

When compiling/installing OCaml 4.02.3 (and older versions) using the Intel compilers (v15.0.3, but also other versions), I'm running into a segmentation fault when ocamlrun is used during the install procedure:

cd stdlib; make COMPILER=../boot/ocamlc all
make[2]: Entering directory /tmp/vsc40023/easybuild_build/OCaml/4.02.3/intel-2015b/ocaml-4.02.3/stdlib' ../boot/ocamlrun ../boot/ocamlc -strict-sequence -w +33..39 -g -warn-error A -bin-annot -nostdlib -safe-string ./Compflags camlinternalFormatBasics.cmi` -c camlinternalFormatBasics.mli
make[2]: *** [camlinternalFormatBasics.cmi] Segmentation fault

GDB gives me this when I execute the same command manually:

$ gdb ../boot/ocamlrun
...
(gdb) run ../boot/ocamlc -strict-sequence -w +33..39 -g -warn-error A -bin-annot -nostdlib -safe-string ./Compflags camlinternalFormatBasics.cmi -c camlinternalFormatBasics.mli
Starting program: /tmp/vsc40023/easybuild_build/OCaml/4.02.3/intel-2015b/ocaml-4.02.3/boot/ocamlrun ../boot/ocamlc -strict-sequence -w +33..39 -g -warn-error A -bin-annot -nostdlib -safe-string ./Compflags camlinternalFormatBasics.cmi -c camlinternalFormatBasics.mli
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0x0000000000432da9 in caml_interprete (prog=0x2aaaac329010, prog_size=0) at interp.c:717
717 accu = Field(accu, *pc); pc++; Next;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.166.el6_7.1.x86_64 ncurses-libs-5.7-4.20090207.el6.x86_64 snoopy-1.7.10-1.el6.x86_64
(gdb) bt
#0 0x0000000000432da9 in caml_interprete (prog=0x2aaaac329010, prog_size=0) at interp.c:717
#1 0x0000000000435cf0 in caml_main (argv=0x2aaaabf39e08) at startup.c:441
#2 0x0000000000431bb8 in main (argc=-1410097656, argv=0x0) at main.c:54

This problem has also been reported on the Intel forums, see https://software.intel.com/en-us/forums/intel-c-compiler/topic/560603

Steps to reproduce

configure with: ./configure -cc "icc -O3 -xHost", and build with "make -j 1 world.opt"

@vicuna
Copy link
Author

vicuna commented Nov 15, 2015

Comment author: @xavierleroy

It is unwise to compile OCaml's runtime system with -O3 optimization, as it contains behaviors that are formally undefined in the ISO C standards and which can lead to over-optimization. For GCC and Clang we now use -O2 with appropriate flags to tame some optimizations down.

Do you still see the problem with "icc -O1" ? Or even "icc -O0" ? That would help narrowing the issue down.

@vicuna
Copy link
Author

vicuna commented Jan 23, 2016

Comment author: kehoste

Just checked: the problem persists when using "icc -O1 -xHost", "icc -O0 -xHost" or even "icc -O0".

@vicuna
Copy link
Author

vicuna commented Jan 27, 2016

Comment author: dobenour

ICC does not have the "-fwrapv" option. It does have "-fno-strict-overflow" but that does not necessarily do what we want -- see http://postgresql.nabble.com/RFC-overflow-checks-optimized-away-td5741233.html
(ICC miscompiled PostgreSQL even with -fno-strict-overflow). Intel does support -no-ansi-alias as an alternative to -fno-strict-aliasing.

I think that configure should check for support for -fwrapv and -fno-strict-aliasing and fail if they are not supported (except when using MSVC). GCC and Clang do support -fwrapv, and MSVC always uses wrapping behavior for overflow, so this should not be a major problem.

@vicuna
Copy link
Author

vicuna commented Jan 27, 2016

Comment author: kehoste

I tried using "icc -O0 -fno-strict-overflow", issue persists...

Is there anything else I should try?

Is the message basically that building OCaml with Intel compilers isn't going to work out?

@vicuna
Copy link
Author

vicuna commented Jan 28, 2016

Comment author: dobenour

I am not part of the OCaml team, but that appears to be the case -- the OCaml runtime does not conform to the C standard, but requires extensions (strict wrapping on overflow) that ICC does not support. OCaml's RTS is not the only large C project that requires -fwrapv -- so does CPython and (I believe) the Linux kernel.

Perhaps you could report this as an ICC bug.

@vicuna
Copy link
Author

vicuna commented Nov 11, 2016

Comment author: @xavierleroy

I'm surprised that "icc -O0" would miscompile OCaml because it assumes no integer overflow. I mean, this assumption is used only by optimizations that should not be there at level O0. Yet I don't feel like going through the hoops needed to get a free version of ICC just to debug what's going on here. If anyone is willing to, please do and report here. Otherwise I move we just say "don't use ICC" in the installation instructions.

@vicuna
Copy link
Author

vicuna commented Dec 8, 2016

Comment author: @mshinwell

@kehoste Can you tell us how much of a problem it would cause for you if we said that the Intel compilers are not a supported compilation platform for OCaml?

@vicuna
Copy link
Author

vicuna commented Dec 8, 2016

Comment author: kehoste

@shinwell I think it would be very unfortunate...

We (HPC-UGent) compile pretty much all of the software we install with the Intel compilers, which includes over 1000 (scientific) software packages, and we rarely see something that builds fine with GCC and not with the Intel compilers. The only other example I know of is wxWidgets.

So to me not being able to compile OCaml with the Intel compilers is a bug that should be fixed...

It's unclear to me how much effort this would require though. It may be just a matter of finding the magic compiler option that makes ICC process the source code like GCC does.

Is it already clear what the exact problem is?

@vicuna
Copy link
Author

vicuna commented Dec 8, 2016

Comment author: @mshinwell

I agree it is theoretically a bug, but I'm trying to understand the balance between the potential pain caused to users and the effort that has to be expended by the OCaml core team to try to find out what's going on (and also, going forward, trying to keep it working). I don't think we know yet what causes the problem.

Is the inconvenience something like the fact that the "gcc" command on your system actually runs icc?

@vicuna
Copy link
Author

vicuna commented Dec 8, 2016

Comment author: kehoste

No, we have control over what we build with GCC or ICC, we pick the compiler we use.

It's just that since we install pretty much everything else with ICC, we need to make an exception for OCaml, which is particularly annoying when OCaml is a dependency for other stuff; we try to stick to a single (set of) compiler(s) for building/installing a particular software stack.

OCaml being an exception w.r.t. compiler we have to use is annoying.

@vicuna
Copy link
Author

vicuna commented Feb 27, 2017

Comment author: @lefessan

I installed ICC (Intel® Parallel Studio XE Cluster Edition for Linux*
2017, version 2, or more precisely 17.0.2 20170213) this morning, and compiled OCaml 4.02.3, without finding any problem with it, neither in bytecode nor in native code.

If you still have the bug, you should provide more information, so that we can reproduce it (Intel's components installed and used, ./configure arguments to OCaml, full log).

@vicuna
Copy link
Author

vicuna commented Feb 27, 2017

Comment author: @lefessan

I retried again, following your steps more closely. I tested with:

  • icc
  • icc -O0 -xHost
  • icc -O1 -xHost
  • icc -O3 -xHost

All of them worked fine for me, although the latest one is probably dangerous to use anyway.

I'm closing this issue, you can re-open it if you can reproduce the bug with the same icc version as mine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants