|Anonymous | Login | Signup for a new account||2019-02-20 01:40 CET|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0006645||OCaml||standard library||public||2014-11-04 15:14||2016-12-07 11:49|
|Priority||low||Severity||tweak||Reproducibility||have not tried|
|Target Version||4.02.2+dev / +rc1||Fixed in Version||4.03.0+dev / +beta1|
|Summary||0006645: Guarantee that Set.add doesn't allocate and returns the original set if the added element is already in the set|
|Description||I suggest to replace the implementation of Set.add with:|
let rec add x = function
Empty -> Node(Empty, x, Empty, 1)
| Node(l, v, r, _) as t ->
let c = Ord.compare x v in
if c = 0 then t else
if c < 0 then
let ll = add x l in
if l == ll then t else bal ll v r
let rr = add x r in
if r == rr then t else bal l v rr
This improves performance by not allocating when the added element is already in the set, but equally importantly, it guarantees that the result is physically equal to the original set in that case. This makes it possible to implement the common pattern "add if not already present and do something different in the two cases" with a single traversal.
|Just note that strictly speaking it doesn't guarantee anything because the specification of == is very weak.|
|So, do you think we shouldn't write down on the function documentation that its result is physically equal to its argument when the element is already in the set? This property could be very useful to avoid double lookup.|
> Just note that strictly speaking it doesn't guarantee anything because the specification of == is very weak.
It is indeed the case today that storing a value in a data structure and loading it back doesn't necessarily return something physically equal to the original value. This happens when a float is stored in an array or in a unboxed-float record.
But here, we only rely on the fact that:
| (...) as t -> .... t ....
returns something physically equal to its argument (plus usual data flow rules: if the "then" branch is chosen, the result of "if-then-else" is physically equal to the one of the "then" branch; or: the value seen be a function's caller is physically equal to the value computed by the function's body).
Do you envision cases where the suggested implementation of Set.add would indeed not satisfy the stated invariant? (e.g. with js_of_ocaml for instance)
Apart from this question of whether to mention the would-be invariant in the documentation, do you agree with the proposal?
|I agree with the proposal and I agree with fully documenting the invariant. I was just being pedant.|
|Thanks! I've applied this to trunk (rev 15966), documenting the fact that the result is physically equal to the argument (but not the fact that the function doesn't allocate).|
|Preservation of sharing is important for some applications (e.g. static analysis -- one of the reasons that Astree scales is that it is very careful with sharing). However, this is a slippery slope: you did it for Set.add, but what about other Set operations? (union, etc) What about Map operations? Preserving sharing in Map.merge is especially sensitive for static analysis applications.|
Actually, this is only a first step. LexiFi's version of set.ml indeed has similar optimizations for other operations. For instance, our version of Set.union tries to benefit from physical equality and preserve it:
=============== let rec union s1 s2 = if s1 == s2 then s1 (* LEXIFI *) else match (s1, s2) with (Empty, t2) -> t2 | (t1, Empty) -> t1 | (Node(l1, v1, r1, h1), Node(l2, v2, r2, h2)) -> if h1 >= h2 then if h2 = 1 then add v2 s1 else begin (* BEGIN LEXIFI *) let (l2, _, r2) = split v1 s2 in let ll = union l1 l2 in let rr = union r1 r2 in if ll == l1 && rr == r1 then s1 else join ll v1 rr (* END LEXIFI *) end else if h1 = 1 then add v1 s2 else begin (* BEGIN LEXIFI *) let (l1, _, r1) = split v2 s1 in let ll = union l1 l2 in let rr = union r1 r2 in if ll == l2 && rr == r2 then s2 else join ll v2 rr (* END LEXIFI *) end ===============
The reason I've started with Set.add is that it's easier to specify what the optimization does and there is a direct use case (avoid Set.mem for the typical memoization pattern). With Set.union as above, the performance depends on how the two sets have been built (whether they share internal nodes), and it's difficult to reason on that, so the optimization is more like a "best effort" than a guaranteed invariant.
What's your opinion on adding such optimizations e.g. for Set.union?
For Map.merge, we also have a specialized version:
val union: (key -> 'a -> 'a -> 'a) -> 'a t -> 'a t -> 'a t
which is more efficient than Map.merge (although it currently doesn't do anything special w.r.t. physical equality yet):
=============== let rec union f s1 s2 = match (s1, s2) with | (Empty, s) | (s, Empty) -> s | (Node (l1, v1, d1, r1, h1), Node (l2, v2, d2, r2, h2)) -> if h1 >= h2 then let (l2, d2, r2) = split v1 s2 in let l = union f l1 l2 and r = union f r1 r2 in let d = match d2 with | None -> d1 | Some d2 -> f v1 d1 d2 in join l v1 d r else let (l1, d1, r1) = split v2 s1 in let l = union f l1 l2 and r = union f r1 r2 in let d = match d1 with | None -> d2 | Some d1 -> f v2 d1 d2 in join l v2 d r ===============
|2014-11-04 15:14||frisch||New Issue|
|2014-11-04 15:15||frisch||Summary||Guarantee that Set.add doesn't allocate and returns the original set if the added elements is in it => Guarantee that Set.add doesn't allocate and returns the original set if the added element is already in the set|
|2014-12-22 18:08||doligez||Status||new => acknowledged|
|2015-01-13 21:22||doligez||Note Added: 0013083|
|2015-01-13 21:22||doligez||Target Version||=> 4.02.2+dev / +rc1|
|2015-01-13 21:22||doligez||Tag Attached: patch|
|2015-01-13 21:30||frisch||Note Added: 0013085|
|2015-03-12 14:57||frisch||Note Added: 0013463|
|2015-03-25 20:03||doligez||Note Added: 0013552|
|2015-03-26 10:30||frisch||Note Added: 0013558|
|2015-03-26 10:30||frisch||Status||acknowledged => resolved|
|2015-03-26 10:30||frisch||Fixed in Version||=> 4.03.0+dev / +beta1|
|2015-03-26 10:30||frisch||Resolution||open => fixed|
|2015-03-26 10:30||frisch||Assigned To||=> frisch|
|2015-03-26 11:55||xleroy||Note Added: 0013560|
|2015-03-26 12:06||frisch||Note Added: 0013561|
|2016-12-07 11:49||xleroy||Status||resolved => closed|
|2017-02-23 16:43||doligez||Category||OCaml standard library => standard library|
|Copyright © 2000 - 2011 MantisBT Group|