New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List.sort_uniq does not specify which element is kept #7045
Comments
Comment author: @alainfrisch "useless" is a strong word: in many cases, you don't care about which of equal elements is preserved (because they are equal). It's not really worse than List.sort itself which does not specify the ordering between equal elements. |
Comment author: skaller I agree its a strong word. If the comparison is for whole entries by structural comparison then of course it doesn't matter. But if one is sorting an association list it often does matter. The function is clearly intended for such uses since there is a user specified comparator. The way the documentation is worded suggests the function has less utility than if the removal of duplicates were deterministic. The merge function deterministically selects between values from two lists. Of course .. I can implement my own function but I was hoping that the existing one already did the right thing and the issue was in the documentation. |
Comment author: @alainfrisch The function benefits from being based on a merge sort: duplicates are removed on the fly during the merge operation, thus benefiting from smaller list sizes. The stable_sort version would probably need to sort the whole list copied to an array before removing duplicate. It could be convenient to expose a stable_sort_uniq version, but it would probably be much slower for lists with a lot of duplicates.
"clearly" is again a strong word; actually, I don't think it is the case. All sort functions take a user specified comparator; even if the standard structural comparison has the expected semantics, you might want to pass a faster specialized version. Typically, List.sort_uniq could be applied on a list of integers. |
Comment author: skaller Ok. Just FYI I'm handling an association list string * typecode representing a record type. So the lists are trivially small and performance isn't an issue. Considering the various possible ways to do row polymorphism the semantics do matter though. It's not a show stopper. It's not that hard to write the function I require given stable_sort. In fact for my purposes a bubble sort would be fine :) |
Comment author: @alainfrisch You might also consider using the Map module instead of association lists. |
Comment author: skaller Yes, but that would be a lot harder to work with. For example with lists I can defer deciding what to do with duplicate fields. Later I can enforce semantics with appropriate operations, without changing the representation. Map can be used to keep duplicates too, but you'd have to map to a list, if not keeping duplicates you'd map to a single value. Sometime weaker typing is better. |
Comment author: @alainfrisch I'm changing the summary to something more explicit. |
Comment author: @alainfrisch It's probably better to implement this in "user-land" so as to specify explicitly which element is kept (first/last). My feeling is that cases where this choice needs to be specified are rather rare. |
Comment author: skaller ok. Thanks for having a look! |
Original bug ID: 7045
Reporter: skaller
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2017-02-16T14:16:47Z)
Resolution: won't fix
Priority: normal
Severity: minor
Version: 4.02.0
Category: standard library
Bug description
This function of List, as documented:
val sort_uniq : ('a -> 'a -> int) -> 'a list -> 'a list
Same as List.sort, but also remove duplicates.
is useless since it is indeterminate which duplicate entry is removed.
The specification and the implementation should use stable sort and
retain the first entry. Or the last. It doesn't matter
which so long as it is specified.
Steps to reproduce
Read the docs.
The text was updated successfully, but these errors were encountered: