Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ocamlc got segfault in Alpine ppc64le #7562

Closed
vicuna opened this issue Jun 22, 2017 · 16 comments
Closed

Ocamlc got segfault in Alpine ppc64le #7562

vicuna opened this issue Jun 22, 2017 · 16 comments

Comments

@vicuna
Copy link

vicuna commented Jun 22, 2017

Original bug ID: 7562
Reporter: rgdoliveira
Status: acknowledged (set by @xavierleroy on 2017-06-22T18:03:14Z)
Resolution: open
Priority: normal
Severity: crash
Platform: ppc64le
OS: Alpine Linux
OS Version: 3.6.2
Version: 4.04.1
Category: back end (clambda to assembly)
Related to: #7697
Monitored by: @nojb @gasche @dbuenzli

Bug description

I'm building ocaml in Alpine Linux ppc64le and it builds fine. But when I try to use ocamlc, I'm getting a segfault.

Gdb backtrace shows:

#0 0x00003fffb7fad710 in do_relocs (dso=0x3fffb7ff26a0 , rel=0x200ab4b8, rel_size=2495088,
stride=3) at ldso/dynlink.c:379
#1 0x00003fffb7fae1ec in reloc_all (p=0x3fffb7ff26a0 ) at ldso/dynlink.c:1195
#2 0x00003fffb7fafc94 in __dls3 (sp=) at ldso/dynlink.c:1638
#3 0x00003fffb7faf3d4 in __dls2 (base=, sp=0x3ffffffffba0) at ldso/dynlink.c:1424
#4 0x00003fffb7facd2c in _dlstart_c (sp=, dynv=)
at ldso/dlstart.c:147
#5 0x00003fffb7fb1104 in _dlstart () from /lib/ld-musl-powerpc64le.so.1

I know that ocaml was recently ported to ppc64le architecture and works fine with glic, but seems there is an issue with musl.

Steps to reproduce

The steps bellow need to be done inside an Alpine ppc64le:

  • Clone aports repository
    $ git clone https://github.com/alpinelinux/aports.git

  • Build ocaml package
    $ cd aports/community/ocaml
    $ abuild -r

  • Install the built package:
    $ sudo apk add <ocaml_apk>

  • Run ocamlc to get the segmentation fault.

@vicuna
Copy link
Author

vicuna commented Jun 22, 2017

Comment author: @gasche

The OCaml version seems to be 4.04.1 ( https://github.com/alpinelinux/aports/blob/master/community/ocaml/APKBUILD ) with some downstream patches ( https://github.com/alpinelinux/aports/tree/master/community/ocaml ), most of them being build-system related -- the only one affecting code generation marks the stack as non-executable and a ppc64 fix to CONTEXT_* macros in signal_osdeps.h

https://github.com/alpinelinux/aports/blob/master/community/ocaml/010_all_execstacks.patch
https://github.com/alpinelinux/aports/blob/master/community/ocaml/fix-mcontext-fields.patch

Since 4.04.0, "ocamlc" points to the native-compiled ocamlc.opt instead of the bytecode-compiled ocamlc.byte. Out of curiosity, does running ocamlc.byte (also installed in PATH) work correctly?

@vicuna
Copy link
Author

vicuna commented Jun 22, 2017

Comment author: rgdoliveira

I just tried ocamlc.byte and I was able to compile a simple .ml file and run the generated file (no segfault).

@vicuna
Copy link
Author

vicuna commented Jun 22, 2017

Comment author: @xavierleroy

Thanks for trying ocamlc.byte. This confirms my suspicion that the problem is with dynamic loading in OCaml programs compiled to native code, which is the case of ocamlc in this Alpine setup.

The bad news is that we have extremely limited access to ppc64le hardware: just one virtual machine provided by RedHat in Brno, running Fedora (I think). So, I'm at a loss on how to debug this issue.

@vicuna
Copy link
Author

vicuna commented Jun 23, 2017

Comment author: rgdoliveira

xleroy,

I have a VM running Alpine ppc64le and I can give you access to this VM, if that helps you with debug.

Can you talk with me at freenode? My username is 'rdutra'.

@vicuna
Copy link
Author

vicuna commented Jun 27, 2017

Comment author: @mshinwell

I can also try to look at this, I think I can get access to a suitable machine now. @xLeroy please let me know if you have time / want to do it.

@vicuna
Copy link
Author

vicuna commented Jul 6, 2017

Comment author: rgdoliveira

I applied a downstream patch (workaround) in Alpine build of ocaml and it fixed the segfault. Basically, I compiled the ocaml natives using -no-pie flag (https://github.com/alpinelinux/aports/blob/1feea49eaec12328e73541436bd1612228cd7e9a/community/ocaml/fix-segfault-in-ppc64le.patch)

@vicuna
Copy link
Author

vicuna commented Sep 30, 2017

Comment author: @xavierleroy

@shinwell: I lost access to a ppc64le machine, so you are most welcome to try and understand this issue while I try to build a qemu-based VM. (virt-builder should make this easy, except that the version that comes with Ubuntu 16.04 LTS doesn't work.)

@vicuna
Copy link
Author

vicuna commented Sep 30, 2017

Comment author: @xavierleroy

That '-no-pie' helps suggests a misunderstanding between ocamlopt and the dynamic loader about register usage or what not. I'm afraid that even with -no-pie, later attempts to do dynamic loading would fail.

@vicuna
Copy link
Author

vicuna commented Jan 5, 2018

Comment author: @dbuenzli

FWIW this is not specific to ppc64le. The same occurs on alpine 'armv6' and is easy to reproduce in docker.

docker run -it arm32v6/alpine sh
apk add --update bash tar make m4 curl git gcc musl-dev
curl -OL http://caml.inria.fr/pub/distrib/ocaml-4.06/ocaml-4.06.0.tar.gz
./configure -host armv6l-linux-gnueabihf
make world.opt
make install

All the OCaml '.opt' executable segfault as does any executable produced by ocamlopt.byte except if '-cclib -no-pie' is provided on the cli in the final link step.

The configure makes it a bit difficult to target precisely the phase where you want to add flags (and using -cc 'gcc -no-pie' seems to break jbuilder which seems nowadays needed to bootstrap opam) so I went with a dirty:

sed -i s/common_cflags="-O2/common_cflags="-no-pie\ -O2/g configure

This adds -no-pie everywhere and I suspect that's not a very good thing. I have not tested dynlink as I was interested in building a statically linked executable. One additional problem with the latter is that it seems that the -Wl,-E that is added at link time prevents an added -cclib -static from doing its job (which I circumvented by doing an -output-obj and performing the final link step manually with 'gcc -no-pie -static').

Info about gcc and ld:

gcc -v

Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/armv6-alpine-linux-musleabihf/6.4.0/lto-wrapper
Target: armv6-alpine-linux-musleabihf
Configured with: /home/buildozer/aports/main/gcc/src/gcc-6.4.0/configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --build=armv6-alpine-linux-musleabihf --host=armv6-alpine-linux-musleabihf --target=armv6-alpine-linux-musleabihf --with-pkgversion='Alpine 6.4.0' --enable-checking=release --disable-fixed-point --disable-libstdcxx-pch --disable-multilib --disable-nls --disable-werror --disable-symvers --enable-__cxa_atexit --enable-default-pie --enable-cloog-backend --enable-languages=c,c++,objc,java,fortran,ada --with-arch=armv6zk --with-tune=arm1176jzf-s --with-fpu=vfp --with-float=hard --with-abi=aapcs-linux --disable-libquadmath --disable-libssp --disable-libmpx --disable-libmudflap --disable-libsanitizer --enable-shared --enable-threads --enable-tls --disable-libitm --with-system-zlib --with-linker-hash-style=gnu

ld -v

GNU ld (GNU Binutils) 2.28

@vicuna
Copy link
Author

vicuna commented Jan 5, 2018

Comment author: @gasche

The -cclib -static part seems related to #7697: #7697

@vicuna
Copy link
Author

vicuna commented Jan 5, 2018

Comment author: @dbuenzli

If that may help, exactly the same problem I mention (with identical resolution via -no-pie) occurs with:

docker run -it i386/alpine sh
apk add --update bash tar make m4 curl git gcc musl-dev
curl -OL http://caml.inria.fr/pub/distrib/ocaml-4.06/ocaml-4.06.0.tar.gz
./configure -host i386-linux
make world.opt
make install

This one should be more pleasant to diagnose with compilation-time wise.

Also note that this doesn't occur with the amd64/alpine and aarch64/alpine docker images, i.e. no '-no-pie' is needed in these, the compilers work out of the box (despite gcc still being compiled with '--enable-default-pie' in those).

Could this point to some kind of 32-bit issue (though the initial issue mentions ppc64le) ?

@XVilka
Copy link
Contributor

XVilka commented Mar 24, 2020

Shouldn't be this one closed if not reproducible with the padding fix?

@gasche
Copy link
Member

gasche commented Mar 24, 2020

@XVilka can you confirm that the issue is gone with the padding fix? (The fix should be in 4.09.1 and 4.10.0, but not 4.09.0)

@XVilka
Copy link
Contributor

XVilka commented Mar 24, 2020

@gasche you are right, issue still reproducible with 4.09.1 and 4.10.0
While 4.09.1 built fine but segfaulted on running ocamlc, 4.10.0 didn't even finished build on :

docker run -it i386/alpine sh
apk add --update bash tar make m4 curl git gcc musl-dev binutils
ln -s /usr/bin/as /usr/bin/i586-alpine-linux-musl
curl -OL http://caml.inria.fr/pub/distrib/ocaml-4.10/ocaml-4.10.0.tar.gz
tar -xf ocaml-4.10.0.tar.gz && cd ocaml-4.10.0
./configure -host i586-alpine-linux-musl
make world.opt

image

@xavierleroy
Copy link
Contributor

I had a second look at these crashes with Alpine Linux.

They are hard to debug because the crash occurs very early in the execution of ocamlopt-generated binaries, well before any OCaml code is entered, even before main() is called, right inside the program loader.

The root cause seems to be non-PIE object files being linked in PIE mode, which seems to be the default in Alpine. The problem can be reproduced with just C files, no OCaml involved:

~ # cat hello.c
#include <stdio.h>
int main() { printf("Hello, world!\n"); return 0; }
~ # gcc -fno-pie -c hello.c  # produce a non-PIE object file
~ # gcc -o hello hello.o     # link it in default PIE mode
~ # ./hello
Segmentation fault
~ # gcc -no-pie -o hello hello.o  # link it in fixed-address mode
~ # ./hello
Hello, world!

I wish the linker would detect the mismatch and emit a diagnostic, rather than silently producing executables that crash.

The fix is indeed to use -no-pie for linking object files produced by ocamlopt. To a first approximation, no hacking of the configure file is needed, you can do

CC="gcc -no-pie" ./configure

in the build script for the OCaml package. This worked fine for me on i586 and on ppc64le.
This is slightly brutal because it builds ocamlrun (the bytecode interpreter) in no-PIE mode as well, even though it would work fine in PIE mode.

I believe that -no-pie is currently needed for i586, ARM, and PPC, not needed for x86-64 and s390x, and I don't know yet for ARM64/AArch64.

Eventually the configure script of OCaml will add the -no-pie flag where and when appropriate.

While 4.09.1 built fine but segfaulted on running ocamlc, 4.10.0 didn't even finished build on :

4.09.1 builds fine because all the build is done by bytecode executables, but still produces crashing native-code executables. You can see the crashes by running the test suite. The build of 4.10 uses some native-code executables, so that's why you see the crash during the build.

@gasche
Copy link
Member

gasche commented Apr 16, 2020

I was about to ask about which native-code executable were suddenly used in 4.10, but I guess that I'm the one to blame-or-thank here, this would be the BEST_FOO logic (#8840) using the .opt version of each tool when available.

xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Apr 16, 2020
…ault

Some Linux and BSD platforms now generate position-independent
executables (PIE) by default.  However, generating a PIE from
object files that are not PIC (position-independent code) causes
either link-time errors or the production of executable files that
crash when run.

This commit turns PIE off (-no-pie C compiler option) on platforms
where ocamlopt does not generate PIC by default: currently all
platforms except amd64 (x86-64) and s390x (Z systems).

Closes: ocaml#7562
xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Apr 17, 2020
…ault

Some Linux and BSD platforms now generate position-independent
executables (PIE) by default.  However, generating a PIE from
object files that are not PIC (position-independent code) causes
either link-time errors or the production of executable files that
crash when run.

This commit turns PIE off (-no-pie C compiler option) on platforms
where ocamlopt does not generate PIC by default: currently all
platforms except amd64 (x86-64) and s390x (Z systems).

Closes: ocaml#7562
xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Apr 17, 2020
… code

Add link-time option -no-pie when
- PIE is the default on the target system
- ocamlopt does not generate PIC by default (i.e. not amd64, not s390x)
- link-time errors or run-time errors occur when linking non-PIC objects
  in PIE mode.  (Observed on Alpine Linux.)

Closes: ocaml#7562
xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Apr 17, 2020
Alpine Linux produces position-independent executables (PIEs) by default.
If non-PIC object files are given to the linker, it silently produces
a wrong executable that crashes when run.  This is the case for
ocamlopt-generated code, which by default is not PIC
except on amd64 (x86_64) and s390x (Z systems).

Closes: ocaml#7562
xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Apr 17, 2020
…390x

Alpine Linux and perhaps other musl-based Linux distributions produce
position-independent executables (PIEs) by default.  If non-PIC object
files are given to the linker, it silently produces a wrong executable
that crashes when run.  This is the case for ocamlopt-generated code,
which by default is not PIC except on amd64 (x86_64) and s390x (Z systems).

Closes: ocaml#7562
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants