Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List.sort_uniq does not specify which element is kept #7045

Closed
vicuna opened this issue Nov 17, 2015 · 9 comments
Closed

List.sort_uniq does not specify which element is kept #7045

vicuna opened this issue Nov 17, 2015 · 9 comments

Comments

@vicuna
Copy link

vicuna commented Nov 17, 2015

Original bug ID: 7045
Reporter: skaller
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2017-02-16T14:16:47Z)
Resolution: won't fix
Priority: normal
Severity: minor
Version: 4.02.0
Category: standard library

Bug description

This function of List, as documented:

val sort_uniq : ('a -> 'a -> int) -> 'a list -> 'a list

Same as List.sort, but also remove duplicates.

is useless since it is indeterminate which duplicate entry is removed.
The specification and the implementation should use stable sort and
retain the first entry. Or the last. It doesn't matter
which so long as it is specified.

Steps to reproduce

Read the docs.

@vicuna
Copy link
Author

vicuna commented Nov 17, 2015

Comment author: @alainfrisch

"useless" is a strong word: in many cases, you don't care about which of equal elements is preserved (because they are equal). It's not really worse than List.sort itself which does not specify the ordering between equal elements.

@vicuna
Copy link
Author

vicuna commented Nov 17, 2015

Comment author: skaller

I agree its a strong word. If the comparison is for whole entries by structural comparison then of course it doesn't matter. But if one is sorting an association list it often does matter. The function is clearly intended for such uses since there is a user specified comparator.

The way the documentation is worded suggests the function has less utility than if the removal of duplicates were deterministic. The merge function deterministically selects between values from two lists.

Of course .. I can implement my own function but I was hoping that the existing one already did the right thing and the issue was in the documentation.

@vicuna
Copy link
Author

vicuna commented Nov 17, 2015

Comment author: @alainfrisch

The function benefits from being based on a merge sort: duplicates are removed on the fly during the merge operation, thus benefiting from smaller list sizes. The stable_sort version would probably need to sort the whole list copied to an array before removing duplicate. It could be convenient to expose a stable_sort_uniq version, but it would probably be much slower for lists with a lot of duplicates.

The function is clearly intended for such uses since there is a user specified comparator.

"clearly" is again a strong word; actually, I don't think it is the case. All sort functions take a user specified comparator; even if the standard structural comparison has the expected semantics, you might want to pass a faster specialized version. Typically, List.sort_uniq could be applied on a list of integers.

@vicuna
Copy link
Author

vicuna commented Nov 17, 2015

Comment author: skaller

Ok. Just FYI I'm handling an association list string * typecode representing a record type. So the lists are trivially small and performance isn't an issue. Considering the various possible ways to do row polymorphism the semantics do matter though.

It's not a show stopper. It's not that hard to write the function I require given stable_sort. In fact for my purposes a bubble sort would be fine :)

@vicuna
Copy link
Author

vicuna commented Nov 17, 2015

Comment author: @alainfrisch

You might also consider using the Map module instead of association lists.

@vicuna
Copy link
Author

vicuna commented Nov 17, 2015

Comment author: skaller

Yes, but that would be a lot harder to work with. For example with lists I can defer deciding what to do with duplicate fields. Later I can enforce semantics with appropriate operations, without changing the representation. Map can be used to keep duplicates too, but you'd have to map to a list, if not keeping duplicates you'd map to a single value.

Sometime weaker typing is better.

@vicuna
Copy link
Author

vicuna commented Nov 17, 2015

Comment author: @alainfrisch

I'm changing the summary to something more explicit.

@vicuna
Copy link
Author

vicuna commented Nov 17, 2015

Comment author: @alainfrisch

It's probably better to implement this in "user-land" so as to specify explicitly which element is kept (first/last). My feeling is that cases where this choice needs to be specified are rather rare.

@vicuna
Copy link
Author

vicuna commented Nov 17, 2015

Comment author: skaller

ok. Thanks for having a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants