- Lower Bound for Comparison Sorts
- O(n) Sorts

- CLRS 3rd ed. Chapter 8 (all).
- Screencasts: 10C (also in Laulima)

We have been studying sorting algorithms in which the only operation that is used to gain information is
pairwise comparisons between elements. So far, we have not found a sorting algorithm that runs faster than O(*n* lg
*n*) time.

It turns out it is not possible to give a better guarantee than O(*n* lg *n*) in a
comparison-based sorting algorithm.

The proof is an example of a different level of analysis: of all *possible* algorithms of
a given type for a problem, rather than particular algorithms ... pretty powerful.

A decision tree abstracts the structure of a comparison sort. A given tree represents the comparisons made by a specific sorting algorithm on inputs of a given size. Everything else is abstracted, and we count only comparisons.

For example, here is a decision tree for insertion sort on 3 elements.

Each internal node represents a branch in the algorithm based on the information it determines by comparing between elements indexed by their original positions. For example, at the nodes labeled "2:3" we are comparing the item that was originally at position 2 with the item originally at position 3, although they may now be in different positions.

Leaves represent permutations that result. For example, "⟨2,3,1⟩" is the permutation where the first element in the input was the largest and the third element was the second largest.

This is just an example of one tree for one sorting algorithm on 3 elements. Any given comparison
sort has one tree for each *n*. The tree models all possible execution traces for that
algorithm on that input size: a path from the root to a leaf is one computation.

We don't have to know the specific structure of the trees to do the following proof. We don't even have to specify the algorithm(s): the proof works for any algorithm that sorts by comparing pairs of keys. We don't need to know what these comparisons are. Here is why:

- The root of the tree represents the unpermuted input data.
- The leaves of the tree represent the possible permuted (sorted) results.
- The branch at each internal node of the tree represents the outcome of a comparision that changes the state of the computation.
- The paths from the root to the leaves represent possible courses that the computation can take: to get from the unsorted data at the root to the sorted result at a leaf, the algorithm must traverse a path from the root to the correct leaf by making a series of comparisons (and permuting the elements as needed)
- The length of this path is the runtime of the algorithm on the given data.
- Therefore, if we can derive a lower bound on the longest path from the root to a leaf of
*any*such tree, we have a lower bound on the running time of*any*comparison-based sorting algorithm.

We get our result by showing that the decision tree on the input of size *n* must have height at least Ω(*n* lg *n*), i.e., will have a path from the root to some leaf of length at least Ω(*n* lg *n*). This will be the lower bound on the
running time of *any* comparison-based sorting algorithm.

- There are at least
*n*! leaves because every permutation appears at least once (the algorithm must correctly sort every possible permutation):*l*≥*n*! - Any binary tree of height
*h*has*l*≤ 2^{h}leaves (Notes #8) - Putting these facts together:
*n*! ≤*l*≤ 2^{h}or 2^{h}≥*n*! - Taking logs:
*h*≥ lg(*n*!) - Using Sterling's approximation (formula 3.17):
*n*! > (*n*/*e*)^{n} - Substituting into the inequality:
*h*≥ lg(*n*/*e*)^{n}

=*n*lg(*n*/*e*)

=*n*lg*n*-*n*lg*e*

= Ω (*n*lg*n*).

Thus, the height of a decision tree that permutes *n* elements to all possible permutations
cannot be less than *n* lg *n*.

A path from the leaf to the root in the decision tree corresponds to a sequence of comparisons,
so there will always be some input that requires at least Ω(*n* lg *n*) comparisions in
*any* comparision based sort.

There may be some specific paths from the root to a leaf that are shorter. For example, when
insertion sort is given sorted data it follows an O(*n*) path. But to give an o(*n* lg
*n*) guarantee (i.e, strictly better than O(*n* lg *n*)), one must show that *
all* paths are shorter than O(*n* lg *n*), or that the tree height is o(*n* lg
*n*) and we have just shown that this is impossible since it is Ω(*n* lg
*n*).

Sometimes, if we know something about the structure of the data, it is possible to sort it without comparing elements to each other. Since we do not need to do comparisons, we can sort data faster than the Ω(*n* lg *n*) lower bound presented above. *How is this possible? Answer.*

Typically these algorithms work by using information about the keys themselves to put them
"in their place" without comparisons. Here we describe several algorithms that are able to sort data in O(*n*) time.

Assumes (requires) that keys to be sorted are integers in {0, 1, ... *k*}.

For each element in the input, determines how many elements are less than that input.

Then we can place the element directly in a position that leaves room for the elements that come after it.

An example ...

Counting sort is a **stable sort**, meaning that two elements that are equal under their key
will stay in the same order as they were in the original sequence. This is a useful property ...

Counting sort requires Θ(*n* + *k*). If *k<n*, counting sort runs in Θ(*n*) time.

Using a stable sort like counting sort, we can sort from the least to the most significant digit:

This is how punched card sorters used to work.

The code is trivial, but requires a stable sort and only works on *n* *d*-digit numbers
in which each digit can take up to *k* possible values:

If the stable sort used is Θ(*n* + *k*) time (like counting sort) then
RADIX-SORT is Θ(*d*(*n* + *k*)) time.

Bucket Sort maps the keys to the interval [0, 1), placing
each of the *n* input elements into one of *n* buckets. If there are collisions,
chaining (linked lists) are used.

Next, the algorithm sorts each chain using any known sorting algorithm, e.g. insertion sort.

Finally, the algorithm outputs the contents of each bucket in order of the buckets, by contactenating the chains.

The algorithm assumes that the input is from a uniform distribution, i.e., each key is equally likely to fall into either bucket. Then each chain is expected to contain a constant number of items and, therefore, the sorting of each chain is expected to take O(1) time.

The numbers in the input array A are thrown into the buckets in B according to their
magnitude. For example, 0.78 is put into bucket 7, which is for keys 0.7 ≤ *k* <
0.8. Later on, 0.72 maps to the same bucket: like chaining in hash tables, we "push" it onto the
beginning of the linked list.

At the end, we sort the lists (B shows the lists after they are sorted; otherwise we would have 0.23, 0.21, 0.26) and then copy the values from the lists back into an array.

You can also compare some of the sorting algorithms with these animations (set to 50 elements): http://www.sorting-algorithms.com/. Do the algorithms make more sense now?

Nodari Sitchinava (based on material by Dan Suthers) Images are from the instructor's material for Cormen et al. Introduction to Algorithms, Third Edition, and from Wikipedia commons.