insertion sort versus radix sort

(Latest Revision: 09/24/2003)

Insertion Sort Versus Radix Sort

Discussion of the relative efficiencies of insertion sort and radix sort

Insertion Sort

N-1 passes sort N items.
The sorting is based on comparisons. The sort makes a comparison each time it checks to determine if one of the list key values is greater than another list key value.
In the jth pass, item j+1 is inserted into a sorted sub-list of j elements.
The new item has to be inserted into the correct location. For our sublist of j elements there are j+1 possible locations where our item j+1 may belong.
Suppose we use a sentinel search to find the position for item j+1. Depending on which position within the list our item j+1 assumes, the number of comparisons required to find the position may be 1 (if the new item belongs last in the new list), 2, 3, ..., j, or j+1 (if the new item belongs first in the new list).
We assume that each position is equally likely to be the right one for item j+1, so the average number of comparisons required to insert item j+1 is

(1+2+3+ ... +j+(j+1))/(j+1) = (j+1)(j+2)/2*(j+1) = (j+2)/2.
The total average number of comparisons for performing the insertion sort is the sum of the average number of comparisons done in each pass. It is the sum of terms of the form (j+2)/2, for values of j from 1 to N-1 (i.e. passes 1 through N-1, in which elements 2 through N are inserted).
Therefore the total average number of comparisons for the whole sorting process is

3/2 + 4/2 + ... + (N+1)/2 = (1/2)*(1+2+3+...+(N+1)) - (1/2)*(1+2)) = [(N+1)(N+2)/4] - [3/2] = N^2/4 + 3N/4 - 1 ("N^2" means "N to the second power")
By far the most significant thing about the work function above is that the dominant term is a constant times N^2, a term that is "quadratic" in N. This tells us that the number of comparisons required to perform an insertion sort of N elements is roughly proportionate to the square of N. The problem with this is that if we want to sort a list of moderately large size N, the square of N may be extremely large, and so the work required by insertion sort may be prohibitive.

Radix Sort

Radix sort requires K passes to sort N K-digit items. For example a radix sort would sort 750 3-digit numbers in 3 passes -- one pass for each digit.
Radix sort is not based on comparisons between items. Instead, radix sort performs digit selection.
Each pass of the sort requires N digit selections (one digit selection made out of each element of the list). Therefore the entire sort requires K*N digit selections.
The term K*N appears to be "a constant" times N.
Actually there is a "hidden" relationship between N and K. One can only represent so many distinct items with K digits.

For example, there are only 10 distinct 1-digit numbers (including 0). You can't make eleven different codes for eleven different items just with the ten 1-digit numbers available.

Similarly there are only 100 2-digit numbers available. In general, the number of distinct K-digit numbers is the Kth power of 10. Thus we see that if we are sorting N distinct K-digit items (no duplicates) then

N <= 10^K.

(Here "^" denotes exponentiation.) This relation can be re-written as

log(N) <= K,

where "log" means the common logarithm (log to the base 10).
On the other hand if we assume that our coding scheme uses the smallest possible amount of digits to represent our N items, we can conclude that K-1 is not enough digits to represent N items. If K-1 digits gives us enough codes then why use K-digit codes? As an example, one does not need 4-digit codes to represent 250 items. One can do it with 3-digit codes. If we are going to do a radix sort, it is to our advantage to use 3-digit codes because the sort will only have to perform 3 passes instead of four. Therefore, assuming that the code uses digits "efficiently":

10^(K-1) < N

is true, as well as

N <= 10^K

Taking logs of those relations gives us:

log(N) <= K < log(N) + 1
Therefore the number of digit selections required to sort N items with radix sort is roughly proportionate to N*K, which is no less than N*log(N) (but less than N*(log(N)+1) if the coding of the N items is as compact as possible).
The function log(N) increases very "slowly" as N increases. Therefore sorting a large list of items with radix sort may be feasible in situations where the use of insertion sort is not feasible.

Comparison of Asymptotic "Efficiency" of Insertion Sort and Asymptotic Efficiency of Radix Sort

We can get one measure of the relative efficiency of radix sort and insertion sort by looking at the ratio of the dominant terms of the "work" functions we computed earlier. The ratio of the average number of comparisons done by insertion sort to the maximum number of digit selections done by radix sort is:

{N^2/4 + 3N/4 - 1}/{N*(log(N)+1)}
Using a little calculus (l'Hopital's rule) we find that the limit as N goes to infinity of the ratio is the same as the limit as N goes to infinity of this ratio:

((N/2) + 3/4) / (log(N)+2)

By applying l'Hopital's rule once more, we find the limit is the same as the limit of:

(1/2) / (1/N) = N/2
N/2 goes to infinity as N goes to infinity.
If we did a similar analysis of all the operations performed by radix sort and insertion sort, we would be able to show that the ratio of the 'overall' work functions of the two sorts is very similar to what we saw above for the ratio of comparisons to digit selections. The ratio of the overall work of radix sort to the overall work of insertion sort also goes to infinity as the list size N goes to infinity.
What are the implications of the findings above?
- As the size, N, of the list increases the average number of comparisons required by insertion sort (the number of comparisons required to sort the 'average list') grows at a much greater rate than the number of digit selections required by radix sort (assuming the encoding of the N items is always efficient so that radix sort does not have to perform an excessive number of passes).
- The ratio of the two quantities increases as N increases and there is no bound to how large that ratio grows.
- No matter how much time the different individual units of work take (comparisons or digit selections), there must be some value of N so large that the actual average time taken by comparisons in the insertion sort algorithm is more than the time taken by digit selection in the radix sort algorithm.
  
  Even if a program can do a comparison much faster than a digit selection -- no matter how much faster -- when N is sufficiently large, there will be so very many comparisons required in an insertion sort of an average N-item list, and relatively so many less digit selections required in a radix sort, that the total average time to sort N items with insertion sort will exceed the total time required to sort N items with radix sort.
- Whenever the list size is larger than the "magic" value of N referred to above, the average time for insertion sort will be more than the time for radix sort, and the difference in those times will grow larger and larger without bound as the size of the list increases further beyond the "magic" value.
- We must be cautious to point out that this analysis does not indicate that the average insertion sort of a list is always less efficient than a radix sort of a list of the same size. (That's not true! For smaller lists insertion sort could be faster on average.) It does not indicate that for sufficiently large N all lists of size N are sorted faster by radix sort than by insertion sort. (That's not true either. For one thing, insertion sort does O(N) work on a list that is already in sorted order. )
  
  The analysis here merely demonstrates that radix sort is asymptotically more efficient than the average insertion sort: if the list is large enough then radix sort will be faster than insertion sort on the 'average list'.
- Finally, this analysis does not tell us how large N has to be for radix sort to become faster than the average insertion sort. That value of N will be different in different situations. It will depend on many details: how the algorithm is coded, how the data is represented, how fast the computer can perform basic operations such as comparisons, moves, and digit selections, and so forth.

Summary

In the examples above we compute work functions for insertion sort and radix sort. We then compare the work functions in order to get information about the relative efficiency of the two algorithms.
The example is important because it illustrates the kind of analysis that computer scientists frequently carry out when evaluating algorithms.
The calculation shows that as the size of the sorting problem is "scaled up" radix sort will eventually out-perform the average insertion sort by an ever-increasing performance ratio.
The kind of information we can derive from such an analysis is useful but we may want to know more than it tells us. To predict the exact performance characteristics of a specific program running on a particular computer under specific load conditions is a very difficult problem!