Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocamlopt generates partial SSE2 register reads/writes on AMD64 #5180

Closed
vicuna opened this issue Nov 24, 2010 · 2 comments
Closed

ocamlopt generates partial SSE2 register reads/writes on AMD64 #5180

vicuna opened this issue Nov 24, 2010 · 2 comments
Assignees
Labels

Comments

@vicuna
Copy link

vicuna commented Nov 24, 2010

Original bug ID: 5180
Reporter: meurer
Assigned to: @xavierleroy
Status: closed (set by @xavierleroy on 2012-03-24T14:01:43Z)
Resolution: fixed
Priority: normal
Severity: tweak
Version: 3.12.0
Category: ~DO NOT USE (was: OCaml general)
Monitored by: @hcarty @alainfrisch

Bug description

The AMD64 code generator uses movsd for SSE2 register moves, which introduces false dependencies, since movsd on registers preserves the upper part of the target register. Similarly movlpd is used for SSE2 memory/register moves, which also introduce false dependencies, since the high double must be preserved. The attached patch replaces movsd on SSE2 registers with movapd and movlpd with movsd, leading to increased performance in floating point programs by 5-10% (i.e. the "almabench.ml" test drops from 4.5s to 4.0s on a Core 2 Duo).

File attachments

@vicuna
Copy link
Author

vicuna commented Nov 24, 2010

Comment author: meurer

The second patch also updates all movlpd uses in the runtime support files (amd64.S and amd64nt.asm). movlpd was only used in the GC interface, but it does no harm to replace it with movsd as well.

@vicuna
Copy link
Author

vicuna commented Nov 27, 2010

Comment author: @xavierleroy

Thanks for a very useful suggestion. I confirm speedups between 6% and 13% on my floating-point benchmarks, on an i5 processor. The patch is applied in the 3.12 bugfix branch and will go in the 3.12.1 release.

For the story: ocamlopt's current choice of instructions was copied from what gcc used to do at the time the AMD64 architecture appeared. As clearly explained on
http://wikis.sun.com/display/BluePrints/Instruction+Selection
that choice might have made sense then, but yours is certainly better for today's processors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants