Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006426OCamlOCaml backend (code generation)public2014-05-16 17:432014-06-03 14:10
Reporterfrisch 
Assigned Todoligez 
PrioritylowSeverityfeatureReproducibilityhave not tried
StatusacknowledgedResolutionopen 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0006426: Improve compilation of common idioms on strings
Description- It is quite common to build a string by n-ary concatenation: "foo" ^ bar ^ ":" ^ baz. Currently, this allocates and copies intermediate strings. It should be quite easy to detect this pattern and generate some code which allocates a buffer of the correct size and blit each string to it directly. This could be extended to the case where some of the concatenated strings are obtained by String.sub, Bytes.to_string, etc. This operations could be fused with the blit.

- Less important, but it's also quite common to check prefixes/suffixes with code such as (String.length s <= 4 && String.sub s 0 4 = "foo:"). We could expose a function String.sub_equal to avoid the copy in this case, and maybe have the compiler generate a call to it automatically from the code above.

- I haven't checked, but if the common test s = "" is compiled down to a generic string comparison, it would be worth changing that to String.length s = 0.
Tagspatch
Attached Filesdiff file icon empty_strcmp.diff [^] (1,499 bytes) 2014-06-03 14:06 [Show Content]

- Relationships

-  Notes
(0011496)
gasche (developer)
2014-05-16 17:48
edited on: 2014-05-16 18:25

Not opposed to clever optimizations of strings, but note that to avoid extra allocations you should use `String.concat` instead of `(^)`:
  "foo" ^ bar ^ ":" ^ baz
becomes
  String.concat "" ["foo"; bar; ":"; baz]
and only allocates once.

(0011501)
frisch (developer)
2014-05-16 18:23

Well, String.concat itself will allocate a little bit (a couple of references and closures) and will do useless calls to concatenate the empty separator. This could be counter-productive for small numbers of (small) strings.

We could also expose series of optimized functions:


 val concat3: string -> string -> string
 val concat4: string -> string -> string
 ...

Anyway, it seems it is a place where the compiler could easily do the optimization for the user, so that he doesn't have to think whether it's better to use ^ or a different function.
(0011502)
frisch (developer)
2014-05-16 18:26

Note: as long as strings are not really immutable, one must be careful to preserve the semantics if one optimizes f () ^ g () ^ h (), since the evaluation of each function could modify the result of other functions. In practice, it would be fine to restrict the optimization to the case where concatenated strings are pure.
(0011633)
frisch (developer)
2014-06-03 13:22

A very quick micro-benchmark (Linux 64) to compare three ways of checking string emptiness:

let f1 x = (x = "")
let f2 x = (match x with "" -> true | _ -> false)
let f3 x = String.length x = 0


f1: 2.92s
f2: 1.18s
f3: 0.59s

This shows that even the optimized compilation of string pattern matching might be optimized for the empty string.
(0011634)
frisch (developer)
2014-06-03 14:09

The attached patch empty_strcmp.diff optimizes the comparison against the empty string. It works also when such a comparison arises after inlining (even when the original comparison is the generic comparison). The comparison is reduced to comparing the string length to 0. This still induces two memory accesses due to the way the string length is encoded. It would be possible to perform a single memory access on the first word of the string block.

- Issue History
Date Modified Username Field Change
2014-05-16 17:43 frisch New Issue
2014-05-16 17:48 gasche Note Added: 0011496
2014-05-16 18:23 frisch Note Added: 0011501
2014-05-16 18:25 gasche Note Edited: 0011496 View Revisions
2014-05-16 18:26 frisch Note Added: 0011502
2014-05-16 22:18 frisch Summary Improve compilation of common idions on strings => Improve compilation of common idioms on strings
2014-05-16 22:18 frisch Description Updated View Revisions
2014-06-02 20:27 doligez Assigned To => doligez
2014-06-02 20:27 doligez Status new => acknowledged
2014-06-03 13:22 frisch Note Added: 0011633
2014-06-03 14:06 frisch File Added: empty_strcmp.diff
2014-06-03 14:09 frisch Note Added: 0011634
2014-06-03 14:10 frisch Tag Attached: patch


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker