You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original bug ID: 5180 Reporter: meurer Assigned to:@xavierleroy Status: closed (set by @xavierleroy on 2012-03-24T14:01:43Z) Resolution: fixed Priority: normal Severity: tweak Version: 3.12.0 Category: ~DO NOT USE (was: OCaml general) Monitored by:@hcarty@alainfrisch
Bug description
The AMD64 code generator uses movsd for SSE2 register moves, which introduces false dependencies, since movsd on registers preserves the upper part of the target register. Similarly movlpd is used for SSE2 memory/register moves, which also introduce false dependencies, since the high double must be preserved. The attached patch replaces movsd on SSE2 registers with movapd and movlpd with movsd, leading to increased performance in floating point programs by 5-10% (i.e. the "almabench.ml" test drops from 4.5s to 4.0s on a Core 2 Duo).
The second patch also updates all movlpd uses in the runtime support files (amd64.S and amd64nt.asm). movlpd was only used in the GC interface, but it does no harm to replace it with movsd as well.
Thanks for a very useful suggestion. I confirm speedups between 6% and 13% on my floating-point benchmarks, on an i5 processor. The patch is applied in the 3.12 bugfix branch and will go in the 3.12.1 release.
For the story: ocamlopt's current choice of instructions was copied from what gcc used to do at the time the AMD64 architecture appeared. As clearly explained on http://wikis.sun.com/display/BluePrints/Instruction+Selection
that choice might have made sense then, but yours is certainly better for today's processors.
Original bug ID: 5180
Reporter: meurer
Assigned to: @xavierleroy
Status: closed (set by @xavierleroy on 2012-03-24T14:01:43Z)
Resolution: fixed
Priority: normal
Severity: tweak
Version: 3.12.0
Category: ~DO NOT USE (was: OCaml general)
Monitored by: @hcarty @alainfrisch
Bug description
The AMD64 code generator uses movsd for SSE2 register moves, which introduces false dependencies, since movsd on registers preserves the upper part of the target register. Similarly movlpd is used for SSE2 memory/register moves, which also introduce false dependencies, since the high double must be preserved. The attached patch replaces movsd on SSE2 registers with movapd and movlpd with movsd, leading to increased performance in floating point programs by 5-10% (i.e. the "almabench.ml" test drops from 4.5s to 4.0s on a Core 2 Duo).
File attachments
The text was updated successfully, but these errors were encountered: