ocamlopt generates partial SSE2 register reads/writes on AMD64 #5180

vicuna · 2010-11-24T14:32:21Z

Original bug ID: 5180
Reporter: meurer
Assigned to: @xavierleroy
Status: closed (set by @xavierleroy on 2012-03-24T14:01:43Z)
Resolution: fixed
Priority: normal
Severity: tweak
Version: 3.12.0
Category: ~DO NOT USE (was: OCaml general)
Monitored by: @hcarty @alainfrisch

Bug description

The AMD64 code generator uses movsd for SSE2 register moves, which introduces false dependencies, since movsd on registers preserves the upper part of the target register. Similarly movlpd is used for SSE2 memory/register moves, which also introduce false dependencies, since the high double must be preserved. The attached patch replaces movsd on SSE2 registers with movapd and movlpd with movsd, leading to increased performance in floating point programs by 5-10% (i.e. the "almabench.ml" test drops from 4.5s to 4.0s on a Core 2 Duo).

File attachments

vicuna · 2010-11-24T16:10:29Z

Comment author: meurer

The second patch also updates all movlpd uses in the runtime support files (amd64.S and amd64nt.asm). movlpd was only used in the GC interface, but it does no harm to replace it with movsd as well.

vicuna · 2010-11-27T17:23:08Z

Comment author: @xavierleroy

Thanks for a very useful suggestion. I confirm speedups between 6% and 13% on my floating-point benchmarks, on an i5 processor. The patch is applied in the 3.12 bugfix branch and will go in the 3.12.1 release.

For the story: ocamlopt's current choice of instructions was copied from what gcc used to do at the time the AMD64 architecture appeared. As clearly explained on
http://wikis.sun.com/display/BluePrints/Instruction+Selection
that choice might have made sense then, but yours is certainly better for today's processors.

vicuna closed this as completed Mar 24, 2012

vicuna assigned xavierleroy Mar 14, 2019

vicuna added the bug label Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ocamlopt generates partial SSE2 register reads/writes on AMD64 #5180

ocamlopt generates partial SSE2 register reads/writes on AMD64 #5180

vicuna commented Nov 24, 2010

vicuna commented Nov 24, 2010

vicuna commented Nov 27, 2010

ocamlopt generates partial SSE2 register reads/writes on AMD64 #5180

ocamlopt generates partial SSE2 register reads/writes on AMD64 #5180

Comments

vicuna commented Nov 24, 2010

Bug description

File attachments

vicuna commented Nov 24, 2010

vicuna commented Nov 27, 2010