Re: Sort.array easily degenerates

From: Markus Mottl (mottl@miss.wu-wien.ac.at)
Date: Wed Mar 10 1999 - 01:28:16 MET


From: Markus Mottl <mottl@miss.wu-wien.ac.at>
Message-Id: <199903100028.BAA20920@miss.wu-wien.ac.at>
Subject: Re: Sort.array easily degenerates
To: Xavier.Leroy@inria.fr (Xavier Leroy)
Date: Wed, 10 Mar 1999 01:28:16 +0100 (MET)
In-Reply-To: <19990309114442.07231@pauillac.inria.fr> from "Xavier Leroy" at Mar 9, 99 11:44:42 am

> The Sort.array implementation is Quicksort with insertion sort for
> small partitions, as suggested in Sedgewick. I should know better
> than take some code out of an algorithms textbook and expect that it
> will work well...
>
> At any rate, any one is welcome to send me a better implementation.

I have also compared it to the Sedgewick-version and wondered, what was
wrong with the implementation - it seems that the version in the book
doesn't hold what it promises...

Someone suggested via mail to me that "sort" as can be found in the STL
is very efficient. I took a look at it and it makes indeed a very good
impression. There is an excellent paper about it on the following page:

  http://www.cs.rpi.edu/~musser/gp/timing.html
  Name of paper: Introspective Sorting and Searching Algorithms
  
  download paper from:
  http://www.cs.rpi.edu/~musser/gp/introsort.ps

It's a kind of hybrid version of various sorting algorithms. It does not
only guarantee a worst-case bound of N*log(N), but it is also as fast as
quicksort in the average case. The constant factor compared to quicksort
is just a little bit larger so it seems to be a true alternative.

The implementation requires heap-algorithms. If someone has time, he could
try to implement the sort algorithm with a suitable heap-implementation
from Okasaki's purely functional data structures - some of them are very
efficient. Take a look at the paper and on the page

  http://miss.wu-wien.ac.at/~mottl/ocaml_sources/intro.html

and download "pure_fun.tar.gz". In chapter 3 you will find "LeftistHeap"
and in chapter 5 "SplayHeap". Both are quite efficient (SplayHeap
seems to be faster (garbage collection parameters can change the
behaviour significantly), but is a bit more complicated). With some minor
changes/additions it should be possible to use them for heap-sorting.

As it seems, a collection of such algorithms and data structures would
really come handy in the OCAML-standard-library...

Another question is, whether to also support "stable_sort" as in the
STL. It guarantees that elements which are already sorted will stay in
the same order. This is important with "order"-functions that consider
only a part of the data representation to be sorted.

Best regards,
Markus Mottl

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:20 MET