Re: Sort.array easily degenerates

From: Xavier Leroy (Xavier.Leroy@inria.fr)
Date: Wed Mar 10 1999 - 14:58:02 MET


Date: Wed, 10 Mar 1999 14:58:02 +0100
From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: doligez@pa.dec.com, OCAML <caml-list@inria.fr>
Subject: Re: Sort.array easily degenerates
In-Reply-To: <199903092303.AA14481@six.pa.dec.com>; from doligez@pa.dec.com on Tue, Mar 09, 1999 at 03:03:05PM -0800

I have revised my Quicksort implementation based on that found in
glibc 2, which contains some clever optimizations. Interested parties
can see the code at

  http://camlcvs.inria.fr/cgi-bin/cvsweb.out/ocaml/stdlib/sort.ml

The behavior on extreme situations (input already sorted) is now much
better.

> There's no way to implement Sedgewick's quicksort with the interface
> given in sort.mli. You'd need two comparison functions, one for ">="
> and one for "<=". That explains the degenerate case when all the
> elements are equal.

It's true that ">=" vs. ">" can make a big difference, but >= is
easily definable from <= : a >= b is b <= a, a > b is not(a <= b),
a < b is not(b <= a).

> And it degenerates on already-sorted data because you swap the pivot
> with the right-most element. As a consequence, one of the subarrays
> has its two highest elements in first and last positions, which makes
> the median-of-three degenerate. I think this bug is also in
> Sedgewick's pseudo-code.

Sedgewick doesn't even give pseudo-code for the "median of three"
heuristic, but the glibc implementation avoids this swap altogether.

> Also, you should recurse on the smallest subarray first, not the
> largest.

Good point.

> I vote for Shellsort.

Well, I tried it, and it's noticeably slower than quicksort by almost
a factor of two on random input. Perhaps because polymorphic array
access is quite expensive in OCaml.

- Xavier Leroy



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:20 MET