Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006645OCamlstandard librarypublic2014-11-04 15:142016-12-07 11:49
Reporterfrisch 
Assigned Tofrisch 
PrioritylowSeveritytweakReproducibilityhave not tried
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version 
Target Version4.02.2+dev / +rc1Fixed in Version4.03.0+dev / +beta1 
Summary0006645: Guarantee that Set.add doesn't allocate and returns the original set if the added element is already in the set
DescriptionI suggest to replace the implementation of Set.add with:

    let rec add x = function
        Empty -> Node(Empty, x, Empty, 1)
      | Node(l, v, r, _) as t ->
          let c = Ord.compare x v in
          if c = 0 then t else
          if c < 0 then
            let ll = add x l in
            if l == ll then t else bal ll v r
          else
            let rr = add x r in
            if r == rr then t else bal l v rr


This improves performance by not allocating when the added element is already in the set, but equally importantly, it guarantees that the result is physically equal to the original set in that case. This makes it possible to implement the common pattern "add if not already present and do something different in the two cases" with a single traversal.
Tagspatch
Attached Files

- Relationships

-  Notes
(0013083)
doligez (administrator)
2015-01-13 21:22

Just note that strictly speaking it doesn't guarantee anything because the specification of == is very weak.
(0013085)
frisch (developer)
2015-01-13 21:30

So, do you think we shouldn't write down on the function documentation that its result is physically equal to its argument when the element is already in the set? This property could be very useful to avoid double lookup.
(0013463)
frisch (developer)
2015-03-12 14:57

> Just note that strictly speaking it doesn't guarantee anything because the specification of == is very weak.

It is indeed the case today that storing a value in a data structure and loading it back doesn't necessarily return something physically equal to the original value. This happens when a float is stored in an array or in a unboxed-float record.

But here, we only rely on the fact that:

  function
    | (...) as t -> .... t ....

returns something physically equal to its argument (plus usual data flow rules: if the "then" branch is chosen, the result of "if-then-else" is physically equal to the one of the "then" branch; or: the value seen be a function's caller is physically equal to the value computed by the function's body).

Do you envision cases where the suggested implementation of Set.add would indeed not satisfy the stated invariant? (e.g. with js_of_ocaml for instance)

Apart from this question of whether to mention the would-be invariant in the documentation, do you agree with the proposal?
(0013552)
doligez (administrator)
2015-03-25 20:03

I agree with the proposal and I agree with fully documenting the invariant. I was just being pedant.
(0013558)
frisch (developer)
2015-03-26 10:30

Thanks! I've applied this to trunk (rev 15966), documenting the fact that the result is physically equal to the argument (but not the fact that the function doesn't allocate).
(0013560)
xleroy (administrator)
2015-03-26 11:55

Preservation of sharing is important for some applications (e.g. static analysis -- one of the reasons that Astree scales is that it is very careful with sharing). However, this is a slippery slope: you did it for Set.add, but what about other Set operations? (union, etc) What about Map operations? Preserving sharing in Map.merge is especially sensitive for static analysis applications.
(0013561)
frisch (developer)
2015-03-26 12:06

Actually, this is only a first step. LexiFi's version of set.ml indeed has similar optimizations for other operations. For instance, our version of Set.union tries to benefit from physical equality and preserve it:

===============
let rec union s1 s2 =
      if s1 == s2 then s1 (* LEXIFI *)
      else match (s1, s2) with
        (Empty, t2) -> t2
      | (t1, Empty) -> t1
      | (Node(l1, v1, r1, h1), Node(l2, v2, r2, h2)) ->
          if h1 >= h2 then
            if h2 = 1 then add v2 s1 else begin
              (* BEGIN LEXIFI *)
              let (l2, _, r2) = split v1 s2 in
              let ll = union l1 l2 in
              let rr = union r1 r2 in
              if ll == l1 && rr == r1 then s1
              else join ll v1 rr
              (* END LEXIFI *)
            end
          else
            if h1 = 1 then add v1 s2 else begin
              (* BEGIN LEXIFI *)
              let (l1, _, r1) = split v2 s1 in
              let ll = union l1 l2 in
              let rr = union r1 r2 in
              if ll == l2 && rr == r2 then s2
              else join ll v2 rr
              (* END LEXIFI *)
            end
===============


The reason I've started with Set.add is that it's easier to specify what the optimization does and there is a direct use case (avoid Set.mem for the typical memoization pattern). With Set.union as above, the performance depends on how the two sets have been built (whether they share internal nodes), and it's difficult to reason on that, so the optimization is more like a "best effort" than a guaranteed invariant.

What's your opinion on adding such optimizations e.g. for Set.union?


For Map.merge, we also have a specialized version:

    val union: (key -> 'a -> 'a -> 'a) -> 'a t -> 'a t -> 'a t

which is more efficient than Map.merge (although it currently doesn't do anything special w.r.t. physical equality yet):

===============
    let rec union f s1 s2 =
      match (s1, s2) with
      | (Empty, s) | (s, Empty) -> s
      | (Node (l1, v1, d1, r1, h1), Node (l2, v2, d2, r2, h2)) ->
          if h1 >= h2 then
            let (l2, d2, r2) = split v1 s2 in
            let l = union f l1 l2 and r = union f r1 r2 in
            let d =
              match d2 with
              | None -> d1
              | Some d2 -> f v1 d1 d2
            in
            join l v1 d r
          else
            let (l1, d1, r1) = split v2 s1 in
            let l = union f l1 l2 and r = union f r1 r2 in
            let d =
              match d1 with
              | None -> d2
              | Some d1 -> f v2 d1 d2
            in
            join l v2 d r
===============

- Issue History
Date Modified Username Field Change
2014-11-04 15:14 frisch New Issue
2014-11-04 15:15 frisch Summary Guarantee that Set.add doesn't allocate and returns the original set if the added elements is in it => Guarantee that Set.add doesn't allocate and returns the original set if the added element is already in the set
2014-12-22 18:08 doligez Status new => acknowledged
2015-01-13 21:22 doligez Note Added: 0013083
2015-01-13 21:22 doligez Target Version => 4.02.2+dev / +rc1
2015-01-13 21:22 doligez Tag Attached: patch
2015-01-13 21:30 frisch Note Added: 0013085
2015-03-12 14:57 frisch Note Added: 0013463
2015-03-25 20:03 doligez Note Added: 0013552
2015-03-26 10:30 frisch Note Added: 0013558
2015-03-26 10:30 frisch Status acknowledged => resolved
2015-03-26 10:30 frisch Fixed in Version => 4.03.0+dev / +beta1
2015-03-26 10:30 frisch Resolution open => fixed
2015-03-26 10:30 frisch Assigned To => frisch
2015-03-26 11:55 xleroy Note Added: 0013560
2015-03-26 12:06 frisch Note Added: 0013561
2016-12-07 11:49 xleroy Status resolved => closed
2017-02-23 16:43 doligez Category OCaml standard library => standard library


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker