New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ocamlopt generates incorrect code on arm #7642
Comments
Comment author: @mshinwell I will look at this now. |
Comment author: @mshinwell @glondu: I couldn't reproduce this on an armhf system. Please would you provide:
Can you confirm whether OCaml 4.04 operates correctly on this system? Thanks in advance. |
Comment author: @glondu This happens on "armel" (not "armhf"). 1: attached 2: I don't know the hardware / virtualization environment: what command should I run to figure it out? (Note: I am not root on this machine). This bug occurs in an up-to-date Debian unstable chroot running on a Debian 9.1 system. 3: gcc -v output: I will provide the other requested information soon. |
Comment author: @glondu 4: attached 5: attached 6: 7: OCaml 4.02.3 operates correctly on this system (4.04 has not been tried). |
Comment author: @glondu Hardware is Marvell Armada 370/XP CPU @ 1.6GHz on a Marvell MV78460 SoC Development Board (ARM v7). |
Comment author: @mshinwell It looks like the native code version of ocamlopt has been miscompiled and is giving the wrong factor for the magic division-by-constant algorithm. This is presumably either due to a code generation error in the OCaml arm backend or possibly miscompilation by GCC in the nativeint stubs. Is it possible to get access to the machine in question? (I can provide a reference from a well-known Debian developer if required.) |
Comment author: @mshinwell One other thing: is it possible to repeat the test on the same hardware but running a stable version of Debian? (If so please provide the gcc version as well as the answers.) |
Comment author: @glondu Regarding access: if possible, the procedure is at https://dsa.debian.org/doc/guest-account/ . However it says that "People requesting access should already have a track record of working on Debian for some time." Is it your case? Which "well-known Debian developer" are you thinking of? Regarding testing on stable: I will try in a stretch chroot. |
Comment author: @mshinwell Yeah, let's try the test with a more likely known-good version of GCC first (i.e. on Debian stable). It would be nice to rule that out. |
Comment author: @glondu On Debian stable, the bug disappears! gcc -v output: |
Comment author: @mshinwell So it seems reasonably likely that this is either a GCC bug or some accidental use of undefined behaviour in the runtime which is causing a new, more clever version of GCC to delete more code. Could you just try the following:
Then on the Debian unstable compiler tree:
Now use the new ocamlopt.opt to compile the original test case. What does it print? |
Comment author: @mshinwell Sorry, that should have said: run "make opt.opt", not "make opt". |
Comment author: @glondu Actually, the bug is caused by a Debian-specific patch that forces PIC (attached). Contrarily to what I said before, ocamlopt.opt compiled with this patch produces incorrect code on unstable as well as on stable. |
Comment author: infinity0 I can confirm that removing my patch causes the basic test case above ("10 / 10 = ") to work again. However that just raises more questions. All my patch does is make ARM default to generating PIC code, which shouldn't affect its correctness. without my patch: $ ./ocaml/ocamlopt.opt -I ocaml/stdlib/ -fno-PIC test.ml && ./a.out 10 with my patch: $ ./ocaml/ocamlopt.opt -I ocaml/stdlib/ -fno-PIC test.ml && ./a.out 10 ocamlopt(.byte) is fine in all cases, it is compiled using ocamlc and not ocamlopt. So there is probably still a bug in ocamlopt somewhere, that when used to compile itself, produces a compiler that miscompiles the basic test case given above. |
Comment author: infinity0 The original reason for the arm-PIC patch is documented here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=837359#80 Basically movw/movt instruction pairs causes problems when GCC defaults to -fPIE as is the case on Debian. OCaml generates these for > ARMv6. The arm-PIC patch works around this by generating PIC code by default, but it looks like it's causing problems here. I've attached an alternate patch armhf-movw-movt.patch which has been reported to also fix the OCaml testsuite on armhf with a -fPIE GCC, but I don't know if it's the best thing to do, and we haven't yet tested it further (by compiling other packages with it). There is also one further movw/movt pair further up in the file, in |
Comment author: @xavierleroy There is perhaps a lack of synchronization between Clflags.dlcode and Clflags.pic_code. Alternatively, the branch that generates movw/movt should be guarded by "not !Clflags.dlcode && not !Clflags.pic_code". Yet, I see no connection between this use of movw/movt to load symbol addresses and the magic multiplicative constants for division by a constant. Can someone elaborate? Also, I am surprised -- and irritated actually -- by the amount of patching done in Debian without even notifying the original "upstream" authors. |
Comment author: infinity0
That branch already is the else-branch of a "if !Clflags.pic_code" sibling branch, so I think the extra "&& not" would be redundant. I think there may be two separate bugs here - (1) the movw/movt bug and (2) the fact that when ocamlopt.byte compiles ocamlopt with -fPIC, the resulting ocamlopt.opt generates bad code. That is, we uncovered (2) because of the arm-PIC patch for (1), but I don't know what the deeper cause of (2) is. A slightly older version of the arm-PIC patch worked without problems on ocaml 4.02.3, here it is compiling coq successfully. [1] The -10th Debian revision was used, which already applies the patch [2]. Between 4.02.3 and 4.05.0 there was a minor change, the pic_code variable was dropped from arm/arch.ml and the one from Clflags was used directly instead. So the patch was updated to reflect that. I'm sorry for the amount of patches, mostly it is a volume issue - with all the other packages that maintainers have to take care of, sometimes we forget to upstream things like this one, especially on "less important" architectures like ARM. I did send various other patches upstream before, to OCaml. As for the arm-PIC patch specifically, I felt it was safer to switch a flag than to mess with the code generation. Nothing broke for that version (4.02.3) so I then forgot about it. [1] https://buildd.debian.org/status/fetch.php?pkg=coq&arch=armel&ver=8.6-4%2Bb1&stamp=1500572004&raw=0 |
Comment author: @xavierleroy Also: from the discussion it's unclear to me whether the problem is specific to "armel" or whether "armhf" is affected as well. My ARM test machines (two RPis, a CubieTruck, and a Scaleway VM) are all "armhf", and "armel" is so obsolete these days that I'd suggest removing ocamlopt support for it entirely. |
Comment author: bunk There are two problems: (1) link failure originating from emit_load_symbol_addr with -nodynlink on arch > ARMv6 when gcc defaults to PIE This is the problem on the Debian armhf port (EABI_HF, ARMv7). Ports with a lower baseline (e.g. Debian armel, Raspbian armhf) are not affected by this problem. This problem is present in both OCaml 4.02.3 and 4.05.0. This problem was workarounded in Debian by enabling pic_code when building OCaml on armel/armhf after the Debian gcc stated to default to PIE a year ago. (2) 10 / 10 = 0 on the Debian armel port when OCaml is built as pic_code with a Debian-specific change to workaroud (1) This problem has so far only been observed on the Debian armel port (EABI, ARMv4). This problem is a regression in OCaml 4.02.3 -> 4.05.0. The more interesting general problem is (1), so lets's focus on that: The link errors in the tests are triggered by: Removing this line makes the tests link and pass, but how is in general -nodynlink supposed to work on distributions where gcc defaults to PIE (including -pie, which passes -pie to ld)? |
Comment author: bunk To avoid misunderstandings, I am not claiming that -nodynlink would be completely broken when gcc defaults to PIE. But -nodynlink currently results in the movw/movt branch being used in emit_load_symbol_addr, and this cannot be used in a PIE executable. Perhaps a configure check based on something like "gcc -dM -E - < /dev/null | grep PIE" is the way forward for a proper solution? |
Comment author: @xavierleroy Yes, of course, PIE mode calls for different code generation schemas than non-PIE mode. I recently worked on the 64-bit ports of OCaml to improve support for PIE (#1323); more work remains for the 32-bit ports. What irritates me is that distributions switch to PIE mode by default without even notifying upstream compiler developers. |
Comment author: infinity0 It's possible to setup an armel chroot inside an armhf or arm64 system. That is in fact what I've been doing - using the Debian arm64 porterbox but with armhf and armel chroots, since the actual armel and armhf Debian porterboxes are very slow. Both bugs reproduce in these chroots OK for me. If you'd like to try this yourself on a Debian system, you can follow the instructions at [1] but instead use Using armhf-movw-movt.patch instead of 0010-arm-default-PIC.patch seems like an adequate solution for the time being - both of these fix (1) on armhf - i.e. ocaml test suite runs successfully - and the replacement patch avoids (2) on armel - my armel coq build is still running but it has already gotten past the previous failure, and the basic "10 / 10" test case works. So I'll upload that to Debian unstable at the next opportunity, unless someone sees a reason for me not to. I can see the argument that armel could just be dropped from Debian, and it would make maintainers' lives easier TBH, but that would be a decision for someone else. How would you like to be notified about future flag changes - is it possible for me to send caml-list@inria.fr without subscribing to it, or shall I post somewhere else, etc? For completeness, we also compile with other flags that change infrequently, at the moment they are: CFLAGS=-g -O2 -fdebug-prefix-map=$PWD=. -fstack-protector-strong -Wformat -Werror=format-security (The PIE change was a source-level change by the GCC maintainer rather than to CFLAGS etc, it was a non-standard decision and I don't know it sufficiently-well to go into the details right now.) |
Comment author: bunk This was not a change in the Debian package of OCaml that would pass -pie to gcc. This is a change in the gcc in Debian/Ubuntu that makes gcc compile and link all code as PIE unless -no-pie is explicitly passed to gcc. This gcc change was made in Debian unstable one year ago, and is already included in Debian 9 released in June. amd64 (upstream) and armel/armhf (Debian-specific as discussed above) enable pic_code in Debian 9, -nodynlink might be broken on arm64 and i386 if that's breakage not covered by the testsuite. |
Comment author: @gasche
I think Xavier's suggestion was not to remove the armel port from Debian (or the OCaml packages there), but to configure OCaml (under Debian+armel) with bytecode compilation only, and no native compilation. (I believe that several Debian-supported architectures are already configured in this way.) |
Comment author: infinity0 OK, I'll consider that. Hopefully the change will not involve much extra work in terms of Debian infrastructure. What do you advise for the armhf issue? There are three options for Debian:
With (2) there is the option of keeping armel ocamlopt support, which was my preferred choice in the previous post. |
Comment author: @xavierleroy Here is what you should do for armhf to support PIE:
What my patch does is just to fix the failing test in testsuite/tests/asmcomp by removing the "dlcode := false" line mentioned earlier. This line looks like an experiment that I committed by mistake. With this fix, OCaml 4.05 compiles and passes its tests on armhf with PIE mode manually turned on (i.e. with configure -cc "gcc -pie -fpie"). Some additional explanations: Q: How can the "asmcomp" test fail with wrong code being produced, yet no change is required in the main compiler sources? Q: Why not activate PIC mode by default? Q: What about PIE? Q: Back to the armhf port, what happens with ocamlopt -nodynlink ? |
Comment author: infinity0 Thanks for the patch and explanations, I will go ahead and apply that, and disable armel ocamlopt on Debian. I agree and already knew I should have reported that original bug upstream - that is what the "Forwarded: TODO" line means in the patch in Debian jargon. Everyone forgets about things, there is no need to continue biting my head off. I apologised already and tried to explain the context. As a reminder, (2) still exists, and it would not have been discovered (yet) if my patch hadn't been used. Now, I know that ocamlopt.opt is not compiled with -fPIC by default, but it means there is some issue with the arm codegen on armel, which may or may not also affect armhf PIC code in ways that we haven't discovered yet. |
Comment author: @xavierleroy You're right, one more problem to go (strange division on armel). I'm trying to build an "armel" environment to reproduce it, without success so far. I tried the "sbuild" trick on my ARMv8 Debian Jessie machine but it fails with As concerns real hardware, early Raspbian for Raspberry Pi model 1 was armel, but it's all armhf now. Debian's page https://wiki.debian.org/ArmEabiPort lists dead hardware only, and manages to never mention Raspberry Pi, as if the page was written in 2011. My conclusion is that armel is dead and not worth the maintenance hassle. I'll change my mind if someone tells me how to boot the latest Debian armel on one of my Raspberry Pi (model 1, 2 or 3). |
Comment author: @xavierleroy Eventually the "sbuild" trick worked, on a ARMv7 Ubuntu 16.04 host machine. That's a crazy hybrid (Debian armel unstable running within Ubuntu armhf LTS) but it was enough to reproduce the division problem. The repro uses pristine 4.05.0 sources plus the 0010-arm-default-PIC.patch. The problem shows up when the repro case is compiled with ocamlopt.opt but not with ocamlopt (.byte). This points to a miscompilation of ocamlopt by itself. Great. To be continued. |
Comment author: @xavierleroy Below is a small repro case, excerpted from the ocamlopt sources. It prints the wrong results when compiled with ocamlopt or ocamlopt.opt. The source of the problem is the way ocamlopt treats multiplication for ARM versions < ARMv6, see asmcomp/arm/selection.ml line 59. Reloading post register allocation invalidates the hypothesis that "rm" as pseudo-result is the same physical register as "rm" as argument. The easiest fix is probably to say that ARM version < ARMv6 are not supported by ocamlopt. let ucompare x y = Nativeint.(compare (add x min_int) (add y min_int)) let udivmod n d = Nativeint.( let _ = |
Comment author: infinity0 Thanks for taking a closer look! I asked some other Debian people and apparently not all ARMv8 chips support the older 32-bit instructions, so this might explain your earlier "exec format error". I've disabled ocamlopt on Debian armel, I think this is sufficient for now - and good to hear that the underlying cause does not affect armhf. |
Comment author: @xavierleroy Fix submitted here: #1411 |
Original bug ID: 7642
Reporter: @glondu
Assigned to: @xavierleroy
Status: resolved (set by @xavierleroy on 2017-10-07T09:46:55Z)
Resolution: fixed
Priority: urgent
Severity: block
Version: 4.05.0
Target version: 4.06.0 +dev/beta1/beta2/rc1
Fixed in version: 4.06.0 +dev/beta1/beta2/rc1
Category: back end (clambda to assembly)
Monitored by: @gasche @yallop
Bug description
In Debian, with OCaml 4.05.0, (at least) menhir, extlib, coq, ocamlrss mysteriously fail to build on armel. While investigating the failure for ocamlrss, I discovered that the following program:
let a = int_of_string Sys.argv.(1);;
Printf.eprintf "%d / 10 = %d\n%!" a (a/10);;
when compiled with ocamlopt, prints "10 / 10 = 0". With ocamlc, the result is correct.
File attachments
The text was updated successfully, but these errors were encountered: