You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original bug ID: 7441 Reporter: markghayden Status: acknowledged (set by @xavierleroy on 2017-01-14T15:24:19Z) Resolution: open Priority: normal Severity: feature Platform: AMD OS: MacOS OS Version: 10.12.1 Target version: later Category: middle end (typedtree to clambda) Duplicate of:#7442 Has duplicate:#7440
Bug description
It appears the Array module is not usable for creating optimal code, even for simple array summation.
let stdlib_sumf v =
Array.fold_left (+.) 0.0 v
;;
This allocates 2 floating points (32 bytes on 64-bit) per iteration.
Experiments were with 4.05 trunk with (-O3 and -unbox-closures). For array summation using Array.fold_left, it appears necessary to hand-create a version of Array.fold_left with typecasts specializing to use with floating point arrays, or some other similar method.
Similarly for summing an array of integers. When using Array.fold_left, allocation doesn't occur, but the assembly code generated for the loop includes checks for the type of the array and includes code (never executed) for allocating a floating point value. Similarly, creating a specialized version of Array.fold_left, removes the checks for type of array.
Steps to reproduce
Use attached file. The output below test case and number of bytes allocated summing array with 10,000 floats. All but the inline2 case allocate 32 bytes (2 floats) per iteration. For integer, review the resulting assembly code.
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.
Original bug ID: 7441
Reporter: markghayden
Status: acknowledged (set by @xavierleroy on 2017-01-14T15:24:19Z)
Resolution: open
Priority: normal
Severity: feature
Platform: AMD
OS: MacOS
OS Version: 10.12.1
Target version: later
Category: middle end (typedtree to clambda)
Duplicate of: #7442
Has duplicate: #7440
Bug description
It appears the Array module is not usable for creating optimal code, even for simple array summation.
let stdlib_sumf v =
Array.fold_left (+.) 0.0 v
;;
This allocates 2 floating points (32 bytes on 64-bit) per iteration.
Experiments were with 4.05 trunk with (-O3 and -unbox-closures). For array summation using Array.fold_left, it appears necessary to hand-create a version of Array.fold_left with typecasts specializing to use with floating point arrays, or some other similar method.
Similarly for summing an array of integers. When using Array.fold_left, allocation doesn't occur, but the assembly code generated for the loop includes checks for the type of the array and includes code (never executed) for allocating a floating point value. Similarly, creating a specialized version of Array.fold_left, removes the checks for type of array.
Steps to reproduce
Use attached file. The output below test case and number of bytes allocated summing array with 10,000 floats. All but the inline2 case allocate 32 bytes (2 floats) per iteration. For integer, review the resulting assembly code.
Output from running program.
make -w -k -j4
make: Entering directory `/Users/mhayden/proj/ocaml/flambda'
/Users/mhayden/.opam/macos.dev/bin/ocamlopt -O3 -unbox-closures -c -S a.ml
/Users/mhayden/.opam/macos.dev/bin/ocamlopt -O3 -unbox-closures -o a a.cmx
./a
stdlib 320096
inline0 320096
inline1 320096
inline2 112
File attachments
The text was updated successfully, but these errors were encountered: