I've learned that the best comparison-based sorting algorithms have average time complexity without really understanding why that's a
fundamental limit, so I looked into this and found a simple explanation. For an
array of size , there are different permutations, and every time you
perform a comparison, you reduce the number of candidate permutations that could
represent the sorted array by up to half, thus you need at least comparisons. By Stirling's approximation,
and this is what gives the time complexity lower bound. Framing
this another way, the problem of sorting could be turned into obtaining bits of information in order to determine which of
the permutations of the array is sorted, and comparing two elements of the
array can give you at most 1 bit of information.
So what if you want a better time complexity? Depending on the context of what
you’re sorting, you can use the proof above for identifying ways to do better.
For instance, if you have duplicate values in the array, there’ll be multiple
permutations instead of just 1 permutation among the total permutations
that are considered sorted. If there are possible sorted permutations, then
you're looking to obtain bits of information, and for
sufficiently large , this could end up being bits of information.
What about non-comparison-based sorting algorithms such as radix sort? You still
have total bits of information to obtain assuming no duplicate
values, but you could end up doing operations that give more than 1 bit of
information. Suppose you’re doing radix sort in base on integers in the
range . The bucketing procedure in radix sort can obtain more than 1
bit of information per bucketing operation since a given value can be placed in
one of different buckets in a single operation. However, this operation
still only gives a constant amount of information, at most bits on
average, so this doesn’t affect the overall time complexity.
You may be wondering then why is it that radix sort is known to be a linear time
sorting algorithm, or what went wrong with my analysis? The key lies in the
assumption that there are no duplicate values, which prevents from scaling
beyond , otherwise the Pigeonhole Principle guarantees there must be a
duplicate value. When has an upper bound of , then is bounded
by , so the time complexity for comparison
based sorting algorithms is analogous to the time complexity for
radix sort.
A better way of formulating the amount of information in radix sort is to
reframe the sorted array as being the number of 0's, number of 1's, ..., number
of 's in the array, and there are bits of
information in that reformulation. This perspective handles total bits of
information with consideration for duplicates in the array, and it’s easy to
check that this is within bits of information after waving off the large
constant, verifying that radix sort is able to achieve linear time complexity
when scaling with a fixed and .
What about approximate sorting? Could you approximately sort an array in time where each element is at most positions off from its correct position
for some constant integer ? Counting the number of permutations of an array
satisfying this constraint directly seems difficult, but it can be upper bounded
by where each element can have an offset anywhere in the range . This is a weak upper bound considering how most combinations of
offsets will place some elements outside of the array or two elements in the
same array slot, making them invalid. With this upper bound estimate, you still
need at least
bits of information so approximately sorting an array under this constraint
won't reduce the time complexity.
What if instead of allowing for an offset error of a constant , you allow an
offset error of for some constant ? Similar to the analysis
above, a lower bound for the number of permutations of an array satisfying this
constraint can be obtained by first partitioning the sorted array into roughly groups of elements and taking arbitrary permutations within
each group. It’s clear that this scheme generates approximately sorted arrays
satisfying the constraint, and the number of these permutations is roughly . This lower bound estimate means you need at most bits of information, and
using Stirling's approximation this is roughly
The terms cancel out and the expression reduces to , so you
need at most bits of information to approximately sort an array under
this constraint.
This only checks if it’s plausible to approximately sort an array in linear time
with this constraint, so it’s important to try to come up with a linear time
algorithm to verify it’s possible. One possible idea is a modified quicksort
algorithm, where instead of recursing down all subarrays to fully sort the whole
array, the recursion stops once a subarray has less than elements. It's
easy to verify that this recursion stopping condition ensures that the final
array is approximately sorted. The max recursion depth is for some
function , which is a constant, so the time complexity of this modified
quicksort is .
The takeaway from this analysis is that if you want to do faster than sorting, you may be able to take advantage of properties of your
array such as a known distribution in it. Most real world sorting problems do
have properties that allow for much better performance than standard algorithms.
Though keep in mind that systems performance considerations likely matter just
as much if not more than the algorithm.