(Latest Revision: 09/24/2003)
Insertion Sort Versus Radix Sort
Discussion of the relative efficiencies of insertion sort and radix sort
Insertion Sort
- N-1 passes sort N items.
- The sorting is based on comparisons. The sort makes a
comparison each time it checks to determine if one of the list key
values is greater than another list key value.
- In the jth pass, item j+1 is inserted into a sorted sub-list of j
elements.
- The new item has to be inserted into the correct location. For our
sublist of j elements there are j+1 possible locations where our
item j+1 may belong.
- Suppose we use a sentinel search to find the position for item j+1.
Depending on which position within the list our item j+1 assumes,
the number of comparisons required to find the position may be 1 (if
the new item belongs last in the new list), 2, 3, ..., j, or j+1 (if
the new item belongs first in the new list).
- We assume that each position is equally likely to be the right
one for item j+1, so the average number of comparisons
required to insert item j+1 is
(1+2+3+ ... +j+(j+1))/(j+1) = (j+1)(j+2)/2*(j+1) = (j+2)/2.
- The total average number of comparisons for performing the insertion
sort is the sum of the average number of comparisons done in each
pass. It is the sum of terms of the form (j+2)/2, for values of j
from 1 to N-1 (i.e. passes 1 through N-1, in which elements 2
through N are inserted).
- Therefore the total average number of comparisons for the whole
sorting process is
3/2 + 4/2 + ... + (N+1)/2 =
(1/2)*(1+2+3+...+(N+1)) - (1/2)*(1+2)) =
[(N+1)(N+2)/4] - [3/2] =
N^2/4 + 3N/4 - 1
("N^2" means "N to the second power")
- By far the most significant thing about the work function above is
that the dominant term is a constant times N^2, a term that is
"quadratic" in N. This tells us that the number of comparisons
required to perform an insertion sort of N elements is roughly
proportionate to the square of N. The problem with this is that if
we want to sort a list of moderately large size N, the square of N
may be extremely large, and so the work required by insertion sort
may be prohibitive.
Radix Sort
- Radix sort requires K passes to sort N K-digit items. For example a
radix sort would sort 750 3-digit numbers in 3 passes -- one pass
for each digit.
- Radix sort is not based on comparisons between items. Instead,
radix sort performs digit selection.
- Each pass of the sort requires N digit selections (one digit
selection made out of each element of the list). Therefore the
entire sort requires K*N digit selections.
- The term K*N appears to be "a constant" times N.
- Actually there is a "hidden" relationship between N and K. One can
only represent so many distinct items with K digits.
For example, there are only 10 distinct 1-digit numbers (including
0). You can't make eleven different codes for eleven different
items just with the ten 1-digit numbers available.
Similarly there are only 100 2-digit numbers available. In general,
the number of distinct K-digit numbers is the Kth power of 10. Thus
we see that if we are sorting N distinct K-digit items (no
duplicates) then
N <= 10^K.
(Here "^" denotes exponentiation.) This relation can be re-written
as
log(N) <= K,
where "log" means the common logarithm (log to the base 10).
- On the other hand if we assume that our coding scheme uses the
smallest possible amount of digits to represent our N items, we can
conclude that K-1 is not enough digits to represent N items.
If K-1 digits gives us enough codes then why use K-digit codes? As
an example, one does not need 4-digit codes to represent 250 items.
One can do it with 3-digit codes. If we are going to do a radix
sort, it is to our advantage to use 3-digit codes because the sort
will only have to perform 3 passes instead of four. Therefore,
assuming that the code uses digits "efficiently":
10^(K-1) < N
is true, as well as
N <= 10^K
Taking logs of those relations gives us:
log(N) <= K < log(N) + 1
- Therefore the number of digit selections required to sort N items
with radix sort is roughly proportionate to N*K, which is no less
than N*log(N) (but less than N*(log(N)+1) if the coding of the N
items is as compact as possible).
- The function log(N) increases very "slowly" as N increases.
Therefore sorting a large list of items with radix sort may be
feasible in situations where the use of insertion sort is not
feasible.
Comparison of Asymptotic "Efficiency" of Insertion Sort and Asymptotic
Efficiency of Radix Sort
- We can get one measure of the relative efficiency of radix sort and
insertion sort by looking at the ratio of the dominant terms of the
"work" functions we computed earlier. The ratio of the average
number of comparisons done by insertion sort to the maximum number
of digit selections done by radix sort is:
{N^2/4 + 3N/4 - 1}/{N*(log(N)+1)}
- Using a little calculus (l'Hopital's rule) we find that the limit
as N goes to infinity of the ratio is the same as the limit as N
goes to infinity of this ratio:
((N/2) + 3/4) / (log(N)+2)
By applying l'Hopital's rule once more, we find the limit is the
same as the limit of:
(1/2) / (1/N) = N/2
- N/2 goes to infinity as N goes to infinity.
- If we did a similar analysis of all the operations performed by
radix sort and insertion sort, we would be able to show that the
ratio of the 'overall' work functions of the two sorts is very
similar to what we saw above for the ratio of comparisons to digit
selections. The ratio of the overall work of radix sort to the
overall work of insertion sort also goes to infinity as the list
size N goes to infinity.
- What are the implications of the findings above?
- As the size, N, of the list increases the average number of
comparisons required by insertion sort (the number of
comparisons required to sort the 'average list') grows at a
much greater rate than the number of digit selections required
by radix sort (assuming the encoding of the N items is always
efficient so that radix sort does not have to perform an
excessive number of passes).
- The ratio of the two quantities increases as N increases and
there is no bound to how large that ratio grows.
- No matter how much time the different individual units of work
take (comparisons or digit selections), there must be some
value of N so large that the actual average time taken by
comparisons in the insertion sort algorithm is more than the
time taken by digit selection in the radix sort algorithm.
Even if a program can do a comparison much faster than a digit
selection -- no matter how much faster -- when N is
sufficiently large, there will be so very many comparisons
required in an insertion sort of an average N-item list, and
relatively so many less digit selections required in a radix
sort, that the total average time to sort N items with
insertion sort will exceed the total time required to sort N
items with radix sort.
- Whenever the list size is larger than the "magic" value of N
referred to above, the average time for insertion sort will be
more than the time for radix sort, and the difference in those
times will grow larger and larger without bound as the size of
the list increases further beyond the "magic" value.
- We must be cautious to point out that this analysis does
not indicate that the average insertion sort of a list
is always less efficient than a radix sort of a list of
the same size. (That's not true! For smaller lists insertion
sort could be faster on average.) It does not indicate
that for sufficiently large N all lists of size N are
sorted faster by radix sort than by insertion sort. (That's not
true either. For one thing, insertion sort does O(N) work on a
list that is already in sorted order. )
The analysis here merely demonstrates that radix sort is
asymptotically more efficient than the average
insertion sort: if the list is large enough then radix sort
will be faster than insertion sort on the 'average list'.
- Finally, this analysis does not tell us how large N has to be
for radix sort to become faster than the average insertion
sort. That value of N will be different in different
situations. It will depend on many details: how the algorithm
is coded, how the data is represented, how fast the computer
can perform basic operations such as comparisons, moves, and
digit selections, and so forth.
Summary
- In the examples above we compute work functions for insertion
sort and radix sort. We then compare the work functions in order to get
information about the relative efficiency of the two algorithms.
- The example is important because it illustrates the kind of analysis that
computer scientists frequently carry out when evaluating algorithms.
- The calculation shows that as the size of the sorting problem is "scaled
up" radix sort will eventually out-perform the average insertion sort by
an ever-increasing performance ratio.
- The kind of information we can derive from such an analysis is useful but
we may want to know more than it tells us. To predict the exact
performance characteristics of a specific program running on a particular
computer under specific load conditions is a very difficult problem!