New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCaml 4.04, 4.05-rc1 incompatible with snapshot binutils, when built with -fPIC, results in crashes #7585
Comments
Comment author: Jiong Wang I have received an bug report on this. Quick look at the build log, it looks to me -fPIC was not specified when building the object unix.o? The R_AARCH64_ADR_PREL_PG_HI21 relocation should have not been generated by compiler if -fPIC was specified. |
Comment author: xnox Both unix.o and unix.pic.o are built with -fPIC
Both objects appear to contain something like that:
Is this then a compiler bug? binutils bug? unix.c bug? |
Comment author: xnox binutils 2.28.90.20170718-0ubuntu1 |
Comment author: Jiong Wang Hi xnox, Those are PC relative relocations against local section symbol which is fine under PIC, as they will be fully resolved during static linking. (And you could see what linker is complaining is with different symbol names) What Binutils 2.29 AArch64 linker tightened is actually using pc relative relocation against global symbol defined in the same object with the relocated instructions as it is possible they will be overridden that the instruction sequences won't be position independent. The linker would expect those symbol are marked as .hidden explicitly if they won't be overridden. After searching ocaml source code tree, I suspect asmcomp/arm64/emit.mlp let emit_load_symbol_addr dst s = (there are several other places using adrp, the issue may caused by those parts as well) I feel for PIC, you want to always use the sequences in the "else" part. I am not familiar with ocaml, I may be wrong. |
Comment author: infinity0 In the interests of maybe pushing things along a bit, I have written a straw-man patch. It gets rid of the relocation errors, and the build succeeds, but the tests segfault as soon as they start running. I'm totally unfamiliar with arm64 so I have no idea what the patch is actually doing, but the general idea was to take the existing PIC/non-PIC code examples in asmrun/arm64.S (the {ADDR,LOAD,STORE}GLOBAL macros), and apply this to the parts of the code in emit.mlp that did stuff using the "adrp" instruction. If someone more familiar with arm64 assembly could take a look and maybe spot tweaks that could fix the segfaults, that would be great! If not, we might have to temporarily disable ocamlopt support in Debian (and probably Ubuntu) arm64. |
Comment author: Jiong Wang Hi Infinity0, One thing looks suspicious is
it is emit_symbol_offset, so it will be in the format "symbol+offset", this will The other thing is I assume those assembly auto-generated are supposed to be I feel it will work and can reduce object size. |
Comment author: infinity0 "the patch" = arm64_ocaml_pic_straw-man-2.patch I've done as you said and cut down the patch to only emit extra .hidden lines, in 4 places.There are 5 uses of "adrp" in that file; if we add it to the "else" clause of emit_load_symbol_addr then we get "undefined reference", so the patch omits this. unix.a(unix.o): In function (It seems it's necessary to do a clean-rebuild every time I edit emit.mlp, which is extremely frustrating. "make opt" by itself seems not to work correctly - for example if I add .hidden into the "else" clause, I don't get these "undefined reference" error, until after I do "make clean" then build it again.) Anyway, the patch makes the relocation errors go away, and there are no segfaults, but the failing test remains (same as xnox reported, tests/lib-dynlink-native/main). The diff between the expected and actual output is this: --- reference 2017-07-22 18:39:47.788532745 +0000 This looks similar to the "undefined reference" errors I described above, so then I tried amending the patch to wrap the .hidden lines like this: ( if Compilenv.symbol_in_current_unit s then I could only do this 3/4 of the time, the other one is a emit_label and I couldn't find the corresponding "in_current_unit" function for that. But unfortunately this does not fix things, and the failure remains. |
Comment author: Jiong Wang Hi Infinity0, I have setup the Ocaml environment on AArch64, and am able to reproduce all For your first patch arm64_ocaml_pic_straw-man-1.patch:I don't know why it doesn't work. The reason looks like GOT based indirect So, I modified your patch in two places:
After these two change, the builds are OK and make tests are OK. As this patch is using GOT based indirect addressing for all pic code, there For your second patch arm64_ocaml_pic_straw-man-2.patch:I feel this is the correct approach. The reason for the lib-dynlink-native test errors is because some symbols Marking all symbol as ".hidden" is overkilling. Actually for a global symbol defined in executable, it won't be overridden, so I think what we really want is your arm64_ocaml_pic_straw-man-2.patch, but
into something like
However, it seems we can't know this in Ocaml after some quick in the code base. So, I have tries another approach:
This approach builds OK and make tests OK as well without the overhead of a few direct address access are turned into indirect access. But I noticed the generated ocamlopt.opt is actually smaller than the original code built by old binutils. I guess it's result of using ".hidden" on most of the symbols. |
Comment author: infinity0 Hey, thank you that worked! I can confirm your amendment to -2.patch works and all the tests pass. (I didn't check -1.patch yet). I'll upload this to Debian for now and we'll see how it does compiling other ocaml packages. Actually, the reason why it works is a bit subtle and I don't understand it fully yet: On a clean 4.05.0 Debian build without .hidden: | ocaml$ readelf -Ws ./ocamlopt.opt | grep 'camlPervasives$' (The symbol appears twice, once in .dynsym and once in .symtab, the latter is lost after stripping.) After applying .hidden (which fixes the relocation warnings), but before applying the "- name = prefix ||" amendment: | ocaml$ readelf -Ws ./ocamlopt.opt | grep 'camlPervasives$' After applying the amendment: | ocaml$ readelf -Ws ./ocamlopt.opt | grep 'camlPervasives$' These symbols ('caml[[:alnum:]]*$') are all GLOBAL on my x86-64 system with ocaml 4.05.0, and they are also all GLOBAL with Debian ocaml-nox_4.05.0-4_arm64.deb [1] which was compiled against binutils 2.28 [2]: [1] http://snapshot.debian.org/archive/debian/20170722T040820Z/pool/main/o/ocaml/ocaml-nox_4.05.0-4_arm64.deb So it seems our original extra .hidden annotations, somehow causes some other symbols (that are not marked HIDDEN) to change from GLOBAL to LOCAL, and your later amendment reverts this. This is very surprising to me. Also I think it is not correct to say that "symbol with "__" will absolutely only referenced within the same unit". For example: ocaml$ readelf -Ws ./testsuite/tests/lib-dynlink-native/plugin.so | grep camlPervasives However the tests pass fine, so perhaps the statement is true for all the cases where the code path goes through symbol_in_current_unit. Anyway, my comments are only for informational purposes to understand the patch better. For now, it seems the patch works so I am satisfied :) I'll proceed with updating OCaml in Debian with it. Hopefully an OCaml developer can confirm soon that the patch is fully correct. To clarify, it is arm64_ocaml_pic_straw-man-2.patch plus this: --- a/asmcomp/compilenv.ml let symbol_in_current_unit name =
|
Comment author: @gasche Would one of you two mind submitting the final version of the patch as github pull request on https://github.com/ocaml/ocaml/, with a link to the current issue? This provides a better interface for review, and it makes it easier to ping the people that should look at this issue. |
Comment author: infinity0 Filed at #1268. I'd be grateful if Jiong Wang could take a look, in case I made any mistakes in explaining it. |
Comment author: @xavierleroy Thanks to @xnox and @infinity0 and @jiong Wang for the report, the discussion, and the Github pull request. Before proceedings further, I'd like to be absolutely sure that this is not a bug in the development version of binutils. When something works with all released version of a software package and suddenly breaks with the development version, it is more often than not a bug in the software package itself, not in its users. So, please, convince me that this is not a binutils bug. |
Comment author: Jiong Wang Hi Xleroy, The binutils commit that caused this trouble is this one:
For pc-relative relocation, it is naturally position independent if the However, when fixing some linker bug related with COPY relocation:
I found my above assumption actually is not correctly. The reason is an So, if in .so, we are allowing pc-relative relocation against global symbol I later found Binutils actually have an macro SYMBOL_REFERENCES_LOCAL which There was similar issue reported by GLIBC community like this one
It later proves to be AArch64 GLIBC issue where GLIBC generic framework has For this Ocaml case, I understand the purpose of using pc-relative addressing, But I feel it is only safe to use pc-relative addressing when that symbol in As global symbol defined in executable won't be overridden, so there is only |
Comment author: @mshinwell Let's take this to the github pull request. |
Comment author: Richard Jones Ignore the previous comment, I think I've got it working now. I'll update the pull request comment thread. |
Comment author: @xavierleroy Fixed by this pull request: #1330. Will be in 4.06. |
Comment author: @gasche Apparently this fix is instrumental in letting people build and use Coq on their Android phone. Congratulations :-) |
Comment author: @gasche ( https://sympa.inria.fr/sympa/arc/coq-club/2017-10/msg00082.html ) |
Original bug ID: 7585
Reporter: xnox
Assigned to: @mshinwell
Status: resolved (set by @mshinwell on 2017-08-07T15:31:32Z)
Resolution: duplicate
Priority: high
Severity: block
Platform: arm64
OS: Ubuntu
OS Version: 17.10/devel
Version: 4.05.0
Target version: 4.06.0 +dev/beta1/beta2/rc1
Fixed in version: 4.06.0 +dev/beta1/beta2/rc1
Category: back end (clambda to assembly)
Has duplicate: #7602
Monitored by: @gasche @glondu "Richard Jones"
Bug description
OCaml 4.04, 4.05-rc1 incompatible with snapshot binutils, when built with -fPIC, results in crashes
When configuring and building ocaml with -fPIC option to configure, it is miscompiled and results in test-suite crashes, and subsequent miscompilation of other ocaml packages.
This is on arm64 only.
See attached buildlog
Steps to reproduce
Additional information
binutils on arm64 has been improved to be more strict with PIC code, which may be resulting in these errors / bugs. Please see the below mailing list thread.
https://sourceware.org/ml/binutils/2017-06/msg00226.html
File attachments
The text was updated successfully, but these errors were encountered: