Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006450OCamlstandard librarypublic2014-06-03 17:502018-08-21 19:02
Assigned Tonojebar 
PrioritynormalSeverityfeatureReproducibilityhave not tried
PlatformOSOS Version
Product Version 
Target Version4.08.0+devFixed in Version 
Summary0006450: Set.disjoint
DescriptionChecking disjointness of sets is a common operation and it can be implemented more efficiently from inside the Set module:

    let rec disjoint s1 s2 =
      match (s1, s2) with
        (Empty, _) | (_, Empty) -> true
      | (Node(l1, v1, r1, _), t2) ->
          if s1 == s2 then false
          else match split v1 t2 with
            (l2, false, r2) -> disjoint l1 l2 && disjoint r1 r2
          | (l2, true, r2) -> false

(it could even probably be more clever, using respective depth of the subtree.)

Any opposition to including this function?
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
shinwell (developer)
2014-06-04 09:54

If the answer to PR 6279 is that these functions should go in external libraries, I don't see how the inclusion of this (far less useful) function could be justified.
frisch (developer)
2014-06-04 14:07

0006279 is about a function which would not really benefit from being included in the functor. Well, actually, I just realized that the split function is exposed in Set, so the disjoint function could indeed be implemented outside the functor as well. But keeping it in Set would make it possible to apply further optimizations to it.
chambart (developer)
2014-06-04 23:36

Note that this one cannot be efficiently implemented outside. The split function is effectively exported, but there is no function to find the root of the tree (here v1 in the pattern matching). And we wouldn't want such a root function, since it wouldn't have a clear semantics.
frisch (developer)
2014-06-05 06:28

Ah yes, indeed.
shinwell (developer)
2014-06-06 10:51

I get the feeling that, increasingly, the criterion for inclusion in the standard library should be whether or not the feature is needed now in the compiler distribution itself. If the feature is not needed there, then it goes outside, as per Damien's comment.

If that is not the criterion, then the next most reasonable criterion would seem to be whether the feature assists in the provision of a comprehensive standard library. In that scenario, I cannot see how the inclusion of [Set.disjoint] but not [] could be justified.

I cannot see why a good criterion for inclusion in the standard library would be one based on efficiency of some core set of functions, leaving users to write the remaining functions on top. That would seem to lead to a fragmented interface, and bugs.
frisch (developer)
2014-06-06 20:23

There is no consensus in the community about a good general purpose alternative to the standard library, and I believe the "official" stdlib will remain widely used, especially for libraries. If an extension fits well in the stdlib, I don't see why we should prevent ourselves from including it. Of course, we need to apply some reason and not include gazillions of tiny one-liner helper functions which only complicate the interface. I've no strong opinion about This does not seem to be a very common operation on sets and it does not benefit particularly from the choice of the underlying concrete data structure; and it's a one-liner. I would have a more positive opinion if the function did something non-trivial, such as guaranteeing a linear complexity for the case where the mapped function is monotonic on the elements in the set (this does not seem very difficult).

Checking disjointness of sets, or combining maps, on the other hand, seem more common and immediately benefit from being included in the stdlib.

> I cannot see why a good criterion for inclusion in the standard library would > be one based on efficiency of some core set of functions, leaving users to
> write the remaining functions on top. That would seem to lead to a fragmented
> interface, and bugs.

So what do you propose? Rejecting simple and useful addition, esp. when they cannot be implemented externally without switching to an entirely different library, means that we encourage users to stop using the official stdlib. This is not a good approach while there is no good alternative to it.
ybarnoy (reporter)
2014-06-11 15:34

There are advantages to having no dependencies. My old research project's approach was not to use any dependencies that were not in-tree. We therefore used the stdlib with our own enhancements. This makes the code very portable, especially for windows systems where opam doesn't work.

Therefore, I see nothing wrong with enhancing the stdlib with useful functionality, even if it's never going to be at the level of Batteries or Core. Heck, had stdlib had low precedence function application (@@) earlier, we wouldn't have littered our code with our own symbol. I'm still amazed that there's no standard function composition operator in stdlib.

Stdlib is the gold standard by which the other 2 libraries live, even if Core likes to flaunt its divergence from stdlib, and it's worth maintaining, even if it's not cutting edge IMO.
frisch (developer)
2016-12-06 19:35
edited on: 2016-12-06 19:36

For the records, we have been using the following version for some time:

    (* Same as split, but compute the left and right subtrees
       only if the pivot element is not in the set.  The right subtree
       is computed on demand. *)

    type split_bis =
      | Found
      | NotFound of t * (unit -> t)

    let rec split_bis x = function
        Empty ->
          NotFound (Empty, (fun () -> Empty))
      | Node(l, v, r, _) ->
          let c = x v in
          if c = 0 then Found
          else if c < 0 then
            match split_bis x l with
            | Found -> Found
            | NotFound (ll, rl) -> NotFound (ll, (fun () -> join (rl ()) v r))
            match split_bis x r with
            | Found -> Found
            | NotFound (lr, rr) -> NotFound (join l v lr, rr)

    let rec disjoint s1 s2 =
      match (s1, s2) with
        (Empty, _) | (_, Empty) -> true
      | (Node(l1, v1, r1, _), t2) ->
          if s1 == s2 then false
          else match split_bis v1 t2 with
            NotFound(l2, r2) -> disjoint l1 l2 && disjoint r1 (r2 ())
          | Found -> false

nojebar (developer)
2018-08-12 11:10 [^]
nojebar (developer)
2018-08-21 19:01

PR merged.

- Issue History
Date Modified Username Field Change
2014-06-03 17:50 frisch New Issue
2014-06-04 09:54 shinwell Note Added: 0011648
2014-06-04 09:54 shinwell Status new => acknowledged
2014-06-04 14:07 frisch Note Added: 0011651
2014-06-04 23:36 chambart Note Added: 0011665
2014-06-05 06:28 frisch Note Added: 0011668
2014-06-06 10:51 shinwell Note Added: 0011692
2014-06-06 20:23 frisch Note Added: 0011699
2014-06-11 15:34 ybarnoy Note Added: 0011735
2016-12-06 19:35 frisch Note Added: 0016653
2016-12-06 19:36 frisch Note Edited: 0016653 View Revisions
2017-02-23 16:43 doligez Category OCaml standard library => standard library
2018-08-12 11:10 nojebar Note Added: 0019303
2018-08-21 19:01 nojebar Note Added: 0019306
2018-08-21 19:01 nojebar Status acknowledged => resolved
2018-08-21 19:01 nojebar Resolution open => fixed
2018-08-21 19:01 nojebar Assigned To => nojebar
2018-08-21 19:02 nojebar Target Version => 4.08.0+dev

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker