Anonymous | Login | Signup for a new account 2015-10-14 06:06 CEST
 Main | My View | View Issues | Change Log | Roadmap

View Issue Details  Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0004986OCamlOCaml generalpublic2010-02-26 16:132015-04-08 08:58
Reporterfrisch
Assigned To
PrioritynormalSeverityfeatureReproducibilityhave not tried
StatusresolvedResolutionopen
PlatformOSOS Version
Product Version
Target VersionFixed in Version4.02.0+dev
Summary0004986: Add efficient functions to create sets from lists or arrays
DescriptionCreating a set from a list or an array is quite common. It could be useful
to add such functions in the Set module.

A trivial implementation is:

let of_array a =

A more efficient implementation is to sort the array first, and then build the balanced tree directly:

let of_sorted_array a =
let rec sub i0 = function
| 0 -> Empty
| 1 -> Node (Empty, a.(i0), Empty, 1)
| 2 -> Node (Node(Empty, a.(i0), Empty, 1), a.(i0 + 1), Empty, 2)
| 3 -> Node (Node(Empty, a.(i0), Empty, 1), a.(i0 + 1), Node(Empty, a.(i0 + 2), Empty, 1), 2)
| n ->
let nl = n / 2 in
let l = sub i0 nl in
let r = sub (i0 + nl + 1) (n - nl - 1) in
create l a.(i0 + nl) r
in
sub 0 (Array.length a)

(The cases for n=1,2,3 are just extra optimizations.)

When creating sets from arrays of random integers of size 100000,
the optimized version is almost twice as fast as the naive one (ocamlopt, x86).
TagsNo tags attached.
Attached Files patch_set_of_list.diff [^] (6,271 bytes) 2012-12-04 15:15

 Relationships

 Notes doligez (administrator) 2012-07-11 11:58 Note: the code looks wrong, nl should probably be something like (i0 + n) / 2. frisch (developer) 2012-07-11 12:01 Why is it wrong? nl represents the length of the subarray, i0 the index of its first element. doligez (administrator) 2012-09-17 17:48 I don't know what I was thinking when I wrote that. The code looks fine. frisch (developer) 2012-12-04 15:23 A patch is attached, with also a new List.sort_dedup functions (like List.sort, but removes duplicates). Performance compared to an implementation based on folding Set.add is usually improved in a significant way, especially for long lists. The only case of degraded performances is for long lists with a lot of duplicates. Examples: - list of length 10000 made of random strings of length 5 (x1000): 6.61s -> 4.62s - list of length 10 made of random strings of length 5 (x1000000): 0.96s -> 0.76s - list of length 1000 made of random integers between 0 and 9 (x10000): 0.45s -> 0.49s - list of length 1000 made of random integers between 0 and 9999 (x10000): 1.89s -> 1.28s frisch (developer) 2013-07-09 13:02 Commit 13876 on trunk adds List.sort_uniq and Set.of_list.

 Issue History Date Modified Username Field Change 2010-02-26 16:13 frisch New Issue 2011-06-01 14:31 doligez Status new => acknowledged 2012-07-11 11:58 doligez Note Added: 0007705 2012-07-11 11:58 doligez Target Version => 4.01.0+dev 2012-07-11 12:01 frisch Note Added: 0007706 2012-07-31 13:36 doligez Target Version 4.01.0+dev => 4.00.1+dev 2012-09-17 17:48 doligez Note Added: 0008099 2012-09-17 17:48 doligez Severity minor => feature 2012-09-17 17:48 doligez Target Version 4.00.1+dev => 4.01.0+dev 2012-12-04 15:15 frisch File Added: patch_set_of_list.diff 2012-12-04 15:23 frisch Note Added: 0008559 2013-07-09 13:02 frisch Note Added: 0009726 2013-07-09 13:02 frisch Status acknowledged => resolved 2013-07-09 13:02 frisch Fixed in Version => 4.02.0+dev 2013-07-09 13:02 frisch Target Version 4.01.0+dev =>