|Anonymous | Login | Signup for a new account||2018-06-23 18:15 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0006426||OCaml||back end (clambda to assembly)||public||2014-05-16 17:43||2014-06-03 14:10|
|Priority||low||Severity||feature||Reproducibility||have not tried|
|Target Version||Fixed in Version|
|Summary||0006426: Improve compilation of common idioms on strings|
|Description||- It is quite common to build a string by n-ary concatenation: "foo" ^ bar ^ ":" ^ baz. Currently, this allocates and copies intermediate strings. It should be quite easy to detect this pattern and generate some code which allocates a buffer of the correct size and blit each string to it directly. This could be extended to the case where some of the concatenated strings are obtained by String.sub, Bytes.to_string, etc. This operations could be fused with the blit.|
- Less important, but it's also quite common to check prefixes/suffixes with code such as (String.length s <= 4 && String.sub s 0 4 = "foo:"). We could expose a function String.sub_equal to avoid the copy in this case, and maybe have the compiler generate a call to it automatically from the code above.
- I haven't checked, but if the common test s = "" is compiled down to a generic string comparison, it would be worth changing that to String.length s = 0.
|Attached Files||empty_strcmp.diff [^] (1,499 bytes) 2014-06-03 14:06 [Show Content]|
edited on: 2014-05-16 18:25
Not opposed to clever optimizations of strings, but note that to avoid extra allocations you should use `String.concat` instead of `(^)`:
"foo" ^ bar ^ ":" ^ baz
String.concat "" ["foo"; bar; ":"; baz]
and only allocates once.
Well, String.concat itself will allocate a little bit (a couple of references and closures) and will do useless calls to concatenate the empty separator. This could be counter-productive for small numbers of (small) strings.
We could also expose series of optimized functions:
val concat3: string -> string -> string
val concat4: string -> string -> string
Anyway, it seems it is a place where the compiler could easily do the optimization for the user, so that he doesn't have to think whether it's better to use ^ or a different function.
|Note: as long as strings are not really immutable, one must be careful to preserve the semantics if one optimizes f () ^ g () ^ h (), since the evaluation of each function could modify the result of other functions. In practice, it would be fine to restrict the optimization to the case where concatenated strings are pure.|
A very quick micro-benchmark (Linux 64) to compare three ways of checking string emptiness:
let f1 x = (x = "") let f2 x = (match x with "" -> true | _ -> false) let f3 x = String.length x = 0
This shows that even the optimized compilation of string pattern matching might be optimized for the empty string.
|The attached patch empty_strcmp.diff optimizes the comparison against the empty string. It works also when such a comparison arises after inlining (even when the original comparison is the generic comparison). The comparison is reduced to comparing the string length to 0. This still induces two memory accesses due to the way the string length is encoded. It would be possible to perform a single memory access on the first word of the string block.|
|2014-05-16 17:43||frisch||New Issue|
|2014-05-16 17:48||gasche||Note Added: 0011496|
|2014-05-16 18:23||frisch||Note Added: 0011501|
|2014-05-16 18:25||gasche||Note Edited: 0011496||View Revisions|
|2014-05-16 18:26||frisch||Note Added: 0011502|
|2014-05-16 22:18||frisch||Summary||Improve compilation of common idions on strings => Improve compilation of common idioms on strings|
|2014-05-16 22:18||frisch||Description Updated||View Revisions|
|2014-06-02 20:27||doligez||Assigned To||=> doligez|
|2014-06-02 20:27||doligez||Status||new => acknowledged|
|2014-06-03 13:22||frisch||Note Added: 0011633|
|2014-06-03 14:06||frisch||File Added: empty_strcmp.diff|
|2014-06-03 14:09||frisch||Note Added: 0011634|
|2014-06-03 14:10||frisch||Tag Attached: patch|
|2017-02-23 16:35||doligez||Category||OCaml backend (code generation) => Back end (clambda to assembly)|
|2017-02-23 16:44||doligez||Category||Back end (clambda to assembly) => back end (clambda to assembly)|
|Copyright © 2000 - 2011 MantisBT Group|