Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add efficient functions to create sets from lists or arrays #4986

Closed
vicuna opened this issue Feb 26, 2010 · 5 comments
Closed

Add efficient functions to create sets from lists or arrays #4986

vicuna opened this issue Feb 26, 2010 · 5 comments

Comments

@vicuna
Copy link

vicuna commented Feb 26, 2010

Original bug ID: 4986
Reporter: @alainfrisch
Status: resolved (set by @alainfrisch on 2013-07-09T11:02:47Z)
Resolution: open
Priority: normal
Severity: feature
Fixed in version: 4.02.0+dev
Category: ~DO NOT USE (was: OCaml general)
Monitored by: mehdi @hcarty

Bug description

Creating a set from a list or an array is quite common. It could be useful
to add such functions in the Set module.

A trivial implementation is:

let of_array a =
Array.fold_right add a empty

A more efficient implementation is to sort the array first, and then build the balanced tree directly:

let of_sorted_array a =
  let rec sub i0 = function
    | 0 -> Empty
    | 1 -> Node (Empty, a.(i0), Empty, 1)
    | 2 -> Node (Node(Empty, a.(i0), Empty, 1), a.(i0 + 1), Empty, 2)
    | 3 -> Node (Node(Empty, a.(i0), Empty, 1), a.(i0 + 1), Node(Empty, a.(i0 + 2), Empty, 1), 2)
    | n ->
        let nl = n / 2 in
        let l = sub i0 nl in
        let r = sub (i0 + nl + 1) (n - nl - 1) in
        create l a.(i0 + nl) r
  in
  sub 0 (Array.length a)

(The cases for n=1,2,3 are just extra optimizations.)

When creating sets from arrays of random integers of size 100000,
the optimized version is almost twice as fast as the naive one (ocamlopt, x86).

File attachments

@vicuna
Copy link
Author

vicuna commented Jul 11, 2012

Comment author: @damiendoligez

Note: the code looks wrong, nl should probably be something like (i0 + n) / 2.

@vicuna
Copy link
Author

vicuna commented Jul 11, 2012

Comment author: @alainfrisch

Why is it wrong? nl represents the length of the subarray, i0 the index of its first element.

@vicuna
Copy link
Author

vicuna commented Sep 17, 2012

Comment author: @damiendoligez

I don't know what I was thinking when I wrote that. The code looks fine.

@vicuna
Copy link
Author

vicuna commented Dec 4, 2012

Comment author: @alainfrisch

A patch is attached, with also a new List.sort_dedup functions (like List.sort, but removes duplicates). Performance compared to an implementation based on folding Set.add is usually improved in a significant way, especially for long
lists. The only case of degraded performances is for long lists with a lot of duplicates.

Examples:

  • list of length 10000 made of random strings of length 5 (x1000): 6.61s -> 4.62s
  • list of length 10 made of random strings of length 5 (x1000000): 0.96s -> 0.76s
  • list of length 1000 made of random integers between 0 and 9 (x10000): 0.45s -> 0.49s
  • list of length 1000 made of random integers between 0 and 9999 (x10000): 1.89s -> 1.28s

@vicuna
Copy link
Author

vicuna commented Jul 9, 2013

Comment author: @alainfrisch

Commit 13876 on trunk adds List.sort_uniq and Set.of_list.

@vicuna vicuna closed this as completed Jul 9, 2013
@vicuna vicuna mentioned this issue Mar 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant