Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0004986OCamlOCaml generalpublic2010-02-26 16:132013-07-09 13:02
Reporterfrisch 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityhave not tried
StatusresolvedResolutionopen 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version4.02.0+dev 
Summary0004986: Add efficient functions to create sets from lists or arrays
DescriptionCreating a set from a list or an array is quite common. It could be useful
to add such functions in the Set module.

A trivial implementation is:

let of_array a =
  Array.fold_right add a empty

A more efficient implementation is to sort the array first, and then build the balanced tree directly:

    let of_sorted_array a =
      let rec sub i0 = function
        | 0 -> Empty
        | 1 -> Node (Empty, a.(i0), Empty, 1)
        | 2 -> Node (Node(Empty, a.(i0), Empty, 1), a.(i0 + 1), Empty, 2)
        | 3 -> Node (Node(Empty, a.(i0), Empty, 1), a.(i0 + 1), Node(Empty, a.(i0 + 2), Empty, 1), 2)
        | n ->
            let nl = n / 2 in
            let l = sub i0 nl in
            let r = sub (i0 + nl + 1) (n - nl - 1) in
            create l a.(i0 + nl) r
      in
      sub 0 (Array.length a)


(The cases for n=1,2,3 are just extra optimizations.)

When creating sets from arrays of random integers of size 100000,
the optimized version is almost twice as fast as the naive one (ocamlopt, x86).
TagsNo tags attached.
Attached Filesdiff file icon patch_set_of_list.diff [^] (6,271 bytes) 2012-12-04 15:15 [Show Content]

- Relationships

-  Notes
(0007705)
doligez (administrator)
2012-07-11 11:58

Note: the code looks wrong, nl should probably be something like (i0 + n) / 2.
(0007706)
frisch (developer)
2012-07-11 12:01

Why is it wrong? nl represents the length of the subarray, i0 the index of its first element.
(0008099)
doligez (administrator)
2012-09-17 17:48

I don't know what I was thinking when I wrote that. The code looks fine.
(0008559)
frisch (developer)
2012-12-04 15:23

A patch is attached, with also a new List.sort_dedup functions (like List.sort, but removes duplicates). Performance compared to an implementation based on folding Set.add is usually improved in a significant way, especially for long
lists. The only case of degraded performances is for long lists with a lot of duplicates.

Examples:
- list of length 10000 made of random strings of length 5 (x1000): 6.61s -> 4.62s
- list of length 10 made of random strings of length 5 (x1000000): 0.96s -> 0.76s
- list of length 1000 made of random integers between 0 and 9 (x10000): 0.45s -> 0.49s
- list of length 1000 made of random integers between 0 and 9999 (x10000): 1.89s -> 1.28s
(0009726)
frisch (developer)
2013-07-09 13:02

Commit 13876 on trunk adds List.sort_uniq and Set.of_list.

- Issue History
Date Modified Username Field Change
2010-02-26 16:13 frisch New Issue
2011-06-01 14:31 doligez Status new => acknowledged
2012-07-11 11:58 doligez Note Added: 0007705
2012-07-11 11:58 doligez Target Version => 4.01.0+dev
2012-07-11 12:01 frisch Note Added: 0007706
2012-07-31 13:36 doligez Target Version 4.01.0+dev => 4.00.1+dev
2012-09-17 17:48 doligez Note Added: 0008099
2012-09-17 17:48 doligez Severity minor => feature
2012-09-17 17:48 doligez Target Version 4.00.1+dev => 4.01.0+dev
2012-12-04 15:15 frisch File Added: patch_set_of_list.diff
2012-12-04 15:23 frisch Note Added: 0008559
2013-07-09 13:02 frisch Note Added: 0009726
2013-07-09 13:02 frisch Status acknowledged => resolved
2013-07-09 13:02 frisch Fixed in Version => 4.02.0+dev
2013-07-09 13:02 frisch Target Version 4.01.0+dev =>


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker