Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patch] detect and optimise endiannes conversion with bswap #5774

Closed
vicuna opened this issue Oct 5, 2012 · 10 comments
Closed

[patch] detect and optimise endiannes conversion with bswap #5774

vicuna opened this issue Oct 5, 2012 · 10 comments

Comments

@vicuna
Copy link

vicuna commented Oct 5, 2012

Original bug ID: 5774
Reporter: @chambart
Assigned to: @chambart
Status: closed (set by @xavierleroy on 2015-07-24T08:38:56Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: 4.00.0
Fixed in version: 4.01.0
Category: back end (clambda to assembly)
Tags: patch
Monitored by: @chambart

Bug description

This patch adds a kind of simple symbolic bitwise interpretation to detect some common patterns.
I have implemented a simple detection of bswap kind of instructions and optimised it on amd64 backend.

It will detect things like that:

let swap16 i =
( (i land 0xFF) lsl 8 )
lor ((i land 0xFF00) lsr 8)

let swap16_2 i =
((i lsl 8) land 0xFF00)
lor ((i land 0xFF00) lsr 8)

let swap32 i =
Int32.logor
(Int32.logor
(Int32.shift_left (Int32.logand i 0xFFl) (83))
(Int32.shift_left (Int32.logand i 0xFF00l) (8
1)))
(Int32.logor
(Int32.shift_right_logical (Int32.logand i 0xFF0000l) (81))
(Int32.shift_right_logical (Int32.logand i 0xFF000000l) (8
3)))

let swap64 i =
Int64.logor
(Int64.logor
(Int64.logor
(Int64.shift_left (Int64.logand i 0xFFL) (87))
(Int64.shift_left (Int64.logand i 0xFF00L) (8
5)))
(Int64.logor
(Int64.shift_left (Int64.logand i 0xFF0000L) (83))
(Int64.shift_left (Int64.logand i 0xFF000000L) (8
1))))
(Int64.logor
(Int64.logor
(Int64.shift_right_logical (Int64.logand i 0xFF_00000000L) (81))
(Int64.shift_right_logical (Int64.logand i 0xFF00_00000000L) (8
3)))
(Int64.logor
(Int64.shift_right_logical (Int64.logand i 0xFF0000_00000000L) (85))
(Int64.shift_right_logical (Int64.logand i 0xFF000000_00000000L) (8
7))))

Additional information

Using it I think that there could be other optimisable instructions like sign extension, some unnecessary tagging/untaging...

The implementation is quadratic in the depth of a boolean expression, but it is possible to make it linear. I didn't notice any difference in compilation time.

The detection method here is made by a brutal pattern matching. It take quite some time to compile the pattern matching. A real detection whould use something smarter to detect more cases.

File attachments

@vicuna
Copy link
Author

vicuna commented Nov 9, 2012

Comment author: @chambart

Added a simplified version with only a %bswap primitive and no pattern detection.

@vicuna
Copy link
Author

vicuna commented Dec 17, 2012

Comment author: @alainfrisch

I believe commit 13106 breaks 32-bit ports (primitive _caml_int64_direct_bswap is not defined).

@vicuna
Copy link
Author

vicuna commented Dec 17, 2012

Comment author: @alainfrisch

I can fix the build by adding those lines after the definition of caml_int64_direct_bwap:

#else
CAMLprim value caml_int64_direct_bswap(value v)
{ caml_failwith("calling int64_direct_bswap on a 32-bit architecture"); }
#endif

This is probably fine, since this primitive is only used by the amd64 port, but it would be nice if the patch author could confirm it.

@vicuna
Copy link
Author

vicuna commented Dec 17, 2012

Comment author: @alainfrisch

Actually, the same without CAMLprim, otherwise caml_int64_direct_bswap appears twice in the generated primitives files, and thus in Runtimedef.builtin_primitives (and so it it recorded twice in Symtable, leading to an empty entry in the PRIM table of the generated bytecode programs, which confuses the bytecode interpreter... just spend three hours on that).

@vicuna
Copy link
Author

vicuna commented Dec 17, 2012

Comment author: @chambart

In fact it is not used at all in the bytecode compiler, so it should not be marked as a CAMLprim.

Sorry for the trouble

@vicuna
Copy link
Author

vicuna commented Dec 17, 2012

Comment author: @chambart

In fact it is not used at all in the bytecode compiler, so it should not be marked as a CAMLprim

@vicuna
Copy link
Author

vicuna commented Dec 17, 2012

Comment author: @chambart

The other functions that should not be annotated CAMLprim:
caml_bswap16_direct
caml_int32_direct_bswap
caml_int64_direct_bswap
caml_nativeint_direct_bswap

@vicuna
Copy link
Author

vicuna commented Dec 17, 2012

Comment author: @alainfrisch

Thanks! Fixed in commit 13134, 13135, 13136. Btw, does anyone know how to cleanly remove a CAMLprim? The problem is that the committed bootstrap ocamlc requires this primitive, so we cannot simply remove the primitive from ocamlrun. I've simply disabled the test for 'unknown C primitive' temporarily in dynlink.c, but I wonder whether there is a cleaner method.

@vicuna
Copy link
Author

vicuna commented Dec 19, 2012

Comment author: @alainfrisch

Ok, so now we have some optimized built-in ways to do endianess conversion. Since this is part of the compiler, shouldn't the function be exposed as (external) functions, e.g. in Pervasives? Otherwise, do we assume that those external declarations would be used in third-party projects? (Not that I really care, but since I've lost some time because of this issue, I'm wondering what's the current status of the proposal.)

@vicuna
Copy link
Author

vicuna commented Dec 19, 2012

Comment author: @lefessan

They are exposed in ocplib-endian (https://github.com/OCamlPro/ocplib-endian). Having these primitives is not enough, you have to write your code in such a way that it does not allocate for most functions. This is what ocplib-endian provides.

OPAM makes it easy now to externalize functionalities that would otherwise have had to be in the stdlib/otherlibs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants