Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set.Make.split not very useful #6111

Closed
vicuna opened this issue Jul 31, 2013 · 10 comments
Closed

Set.Make.split not very useful #6111

vicuna opened this issue Jul 31, 2013 · 10 comments

Comments

@vicuna
Copy link

vicuna commented Jul 31, 2013

Original bug ID: 6111
Reporter: berenger
Status: closed (set by @xavierleroy on 2013-08-28T13:55:32Z)
Resolution: not a bug
Priority: normal
Severity: feature
Version: 4.00.1
Category: standard library
Monitored by: @gasche @hcarty

Bug description

Hello,

Set.Make.split : elt -> t -> t * bool * t

As a scientific programmer, I would most times
want to split like this:
split_lt = (elements < x, elements >= x)
or
split_lte = (elements <= x, elements > x)

Would it be possible to introduce split_lt and split_lte?

Or maybe: split_lt_gte and split_lte_gt?

Thanks,
Francois.

PS: remember that x may be different from the
elements that compare = to x but that is
actually in the set being split

@vicuna
Copy link
Author

vicuna commented Aug 1, 2013

Comment author: berenger

Here is my naive implementation in file set.ml:

let rec split_lt_gte x = function
  | Empty -> (Empty, Empty)
  | Node(l, v, r, _) ->
    let c = Ord.compare x v in
    if c = 0 then (l, add v r)
    else if c < 0 then
      let (ll, rl) = split_lt_gte x l in
      (ll, join rl v r)
    else
      let (lr, rr) = split_lt_gte x r in
      (join l v lr, rr)

let rec split_lte_gt x = function
  | Empty -> (Empty, Empty)
  | Node(l, v, r, _) ->
    let c = Ord.compare x v in
    if c = 0 then (add v l, r)
    else if c < 0 then
      let (ll, rl) = split_lte_gt x l in
      (ll, join rl v r)
    else
      let (lr, rr) = split_lte_gt x r in
      (join l v lr, rr)

I say naive because I see in set.ml functions such that add_min_element
and add_max_element that should maybe be used.
However, as I am not so acquainted with set.ml, I leave this
to the INRIA hackers.

Regards,
F.

@vicuna
Copy link
Author

vicuna commented Aug 1, 2013

Comment author: berenger

Maybe the smaller change is to create:

Set.Make.split_opt : elt -> t -> t * elt option * t

Then users could easily create the other functions
on top of this.

@vicuna
Copy link
Author

vicuna commented Aug 1, 2013

Comment author: @xavierleroy

Hello Francois,

I can see where those new functions would be useful, having recently done something similar (but more general) in Coq, as part of CompCert.

I am not sure this is general enough, though. Consider sets of pairs, lexicographically ordered, and assume you want to extract (efficiently) the set of pairs whose first component is equal to "x".

If it's a set of int * int pairs, your API works fine:

let (lt_x, ge_x) = split_lt (x, min_int) s in
let (eq_x, gt_x) = split_le (x, max_int) ge_x in
eq_x

But if it's a set of string * string pairs, it doesn't work, as strings have no least element, no greater element, and no successor operation.

The approach I used in CompCert is described here:
http://compcert.inria.fr/doc/html/FSetAVLplus.html

It selects set elements according to two boolean predicates that must be compatible with the set order, in a way made precise in the Coq development. But I don't think this can be exposed as is in OCaml's Set API, because it can break the BST invariant.

@vicuna
Copy link
Author

vicuna commented Aug 2, 2013

Comment author: berenger

I don't know Coq unfortunately.
Your problem reminds me of interval trees from computational geometry,
but in a more general/abstract setting.

https://github.com/UnixJunkie/interval-tree/blob/master/interval_tree.ml

Would there be any problem with the following function for set.ml:

let rec split_opt x = function
  | Empty -> (Empty, None, Empty)
  | Node(l, v, r, _) ->
      let c = Ord.compare x v in
      if c = 0 then (l, Some v, r)
      else if c < 0 then
        let (ll, maybe, rl) = split_opt x l in
        (ll, maybe, join rl v r)
      else
        let (lr, maybe, rr) = split_opt x r in
        (join l v lr, maybe, rr)

@vicuna
Copy link
Author

vicuna commented Aug 2, 2013

Comment author: berenger

If we have a split_opt, then we can have:

let split_lt_ge x s =
  let l, maybe, r = split_opt x s in
  match maybe with
    | Some eq_x -> l, add eq_x r
    | None -> l, r

let split_le_gt x s =
  let l, maybe, r = split_opt x s in
  match maybe with
    | Some eq_x -> add eq_x l, r
    | None -> l, r

I personally need only the splitting functions.
But I guess some users may want to directly use split_opt
in some cases.

@vicuna
Copy link
Author

vicuna commented Aug 2, 2013

Comment author: @xavierleroy

The forthcoming 4.01 release has a new function "find" in module Set, which can be used to implement the splitting functions you need:

let split_opt x s =
let (l, maybe, r) = S.split x s in
(l, if maybe then Some(S.find x s) else None, r)

let split_lt_ge x s =
let (l, maybe, r) = S.split x s in
(l, if maybe then S.add (S.find x) r else r)

The cost of calling S.find (log n) is negligible compared with that of S.split (n).

The even more general variants of "split" that I mentioned cannot be encoded efficiently this way, though.

@vicuna
Copy link
Author

vicuna commented Aug 4, 2013

Comment author: berenger

The cost of S.split is really O(n)?
I thought only S.partition had that bad a cost.

http://caml.inria.fr/pub/docs/manual-ocaml/libref/Set.S.html

Sorry if I am wrong,
F.

@vicuna
Copy link
Author

vicuna commented Aug 26, 2013

Comment author: berenger

The functions split_opt, split_lt and split_le
now have corresponding pending pull request in batteries included.

@vicuna
Copy link
Author

vicuna commented Aug 27, 2013

Comment author: berenger

I think this issue can be closed: the functions
are in batteries now.

@vicuna
Copy link
Author

vicuna commented Aug 28, 2013

Comment author: @xavierleroy

Closing this PR as suggested. And, yes, S.split is logarithmic time, not linear; my mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant