(Latest Revision: 02/24/2008)
Comparing Sorting Algorithms
Write a program that REPEATEDLY:
- prompts the user to enter a filename,
- opens the file,
- reads a comment from the first line,
- reads an integer N from the second line (e.g. N = 1000),
- reads N integers from the rest of the file, putting them
into consecutive locations in an array,
- writes the name of the input file,
- writes the comment from the input file,
- sorts an identical copy of the array using each of the following methods:
- selection sort
- insertion sort
- quick sort
- merge sort
- writes a report of the number of compares and moves done in each sort,
and
- asks the user if s/he wishes to continue
UNTIL the user wishes NOT to continue.
MOTIVATION:
Working on this program and actually experiencing the differences in the
amount of work and the amount of runtime is a nice way to get familiar with
analysis of algorithms. This is an important part of the "science" in
computer science.
DETAILS:
Your program must make a little announcement (examples to follow) just before
starting and just after completing each run of one of the four sorting
functions. This will allow you to get a sense of how long each individual
algorithm takes to do its work.
Your program must pass in a new copy of the original array
each time it calls a sorting function. We don't want the first function
call to be the only one that really sorts the original array!
When you count "moves" count the number of times that an element of the list
is copied from one location to another (with an assignment statement). (The
book's C++ swap function performs 3 moves.
When counting compares, count only comparisons of list elements (the
things that are being sorted), not the array indices. For example,
if the array you are using is named "data" the line of code:
if (data[j] < data[j - 1])
contains a comparison that you would have to count, because it is a
comparison of two elements in the list. On the other hand, the line of
code:
while (i < j)
does not compare elements of the list. It is a comparison of array indices
only. Therefore you must not count that comparison.
The line of code
for (;(loc > 0) && (data[loc-1] > nextItem)
contains one comparison that you should not count and one comparison
that you should count.
You should not count "loc > 0" because it compares indices.
You should count "data[loc-1] > nextItem" because it
compares a list element with (a copy of) another list element.
Keep in mind that it may be a little "tricky" to insert the code that
counts the comparisons executed in if-clauses and loop conditions. Give the
problem due thought and consideration.
There is C++ code for performing sorts here:
ftp://www.cs.csustan.edu/pub/john/2008Src/DAPSWM_C++_ed5/ByName/c09
and Java code here:
ftp://www.cs.csustan.edu/pub/john/2008Src/DAPSWM_Java_ed2/Chapter10/SortsClass.java
You will need to modify this code so that it takes care of the counting of the
compares and the moves. That will mean adding parameters to functions
(methods) and adding pieces of code that do counting within functions. We can
discuss more details of this in class.
In the assignment directory, I also included a little prototype of a driver
program called driver.cpp You can get ideas from driver.cpp
for formulating the program you do for this assignment.
SAMPLE INPUT AND OUTPUT:
Let's look at two sample input files and the corresponding output in order to
better understand what the format of the files and outputs has to be.
The sample input files are
ord50
and
ran1000
The corresponding sample program output:
sample.out
Note: When you see a message such as "selection sort starting ... done"
understand that the program first writes "selection sort starting ... "
then performs the selection sort, and then writes "done."
TESTING:
Lists can be small, medium, and large in size. They can also be random,
ordered, or reverse-ordered. You need to test all the possible
combinations (there are 9).
When you test your program on large lists it helps you appreciate
the efficiency advantages of the advanced sorts. Computers are getting
faster and faster! I felt I had to choose lists of size 30,000 for your
large sets. A few years ago lists of size 5,000 probably would have been
quite large enough.
It is very important that you test all the data sets I have indicated. It
will determine what you learn from this assignment (and will also determine
a large fraction of your grade). I put a package containing all the inputs
you need here (but don't click until you are ready to download):
ftp://www.cs.csustan.edu/pub/john/dataFiles.tar.gz
(Some browsers may not access the link above properly. If one browser has a
problem, try another. Clicking on the link should download and unpack the
files on most systems. You may have to do an extra step to unpack the file.)
In the header of your program, include a paragraph or two describing what you
learned (about the performance of the different algorithms) from writing and
testing this program. Do your very best to make this well-written. This will
be another aspect of the assignment that will count heavily.
While making the test script for this program do not cat the
input files onto the screen! This would be the normal procedure if the input
files were of manageable size, but most of the input files we are using in
this assignment are way too long! If you write the program according
to the specifications I gave you, then the information printed by the program
will tell me what I need to know.
WHAT TO TURN IN:
Since this assignment does not require you to do a large amount of program
design work, I am not requiring you to turn in a preliminary version.
You will turn in two printer outputs (hardcopies) and you will send me one
e-mail message. Please follow these rules:
- Always send me e-mail as plain text in the main message body. Never
send me attachments.
- Always use the exact subject line I specify for each message.
(I often get hundreds of e-mail messages in a week. The subject line
allows me to find, filter and sort messages.) You will lose a
significant number of points on the assignment if you use the wrong
subject line.
- Be very careful when typing the command to send e-mail. You may use the
instructions in your
Hello World! lab excercise
for guidance. Of course, you will need to make the obvious changes to
those directions -- you have to use the correct subject line and
filename.
- Always send yourself a copy of each e-mail message you send to me, and
check immediately to see if you receive the message intact.
You are responsible for sending e-mail correctly.
Here is the list of things you have to turn in:
- At the start of class on the due date, place the
following items on the "counter" in front of me:
- a hardcopy of your final version of the program, and
- a hardcopy of your test script showing adequate testing of
your program. (Remember: You are not supposed to let any of the data
files appear on the test script).
Make sure that all of the code and script content shows on the paper.
(Don't use tabs, and make all lines no more than 75 characters long.)
Make sure all content is plainly readable and properly formatted.
- Send the following item to me by e-mail before midnight on the
due date:
A
shell archive file
containing this collection of files:
- A copy of your program source code (which includes your writings
about what you learned),
- Your script showing all the test runs on the input files I gave you,
- A file named 'README' containing the compilation command one should
use to compile your program, and
- A copy of your 'makefile' - if you used one.
Understand that I'm asking you to send me all the files I need to compile
the program (plus some other things).
Make sure the script you send is cleaned up using the sort of "col -b"
filtering trick
illustrated here.
Make sure all your source files are clean too. Sometimes files transferred
from a PC to a Sun Ultra need to be
cleaned up.
Don't send me any compiled code!
E-mail me the shell archive file with the subject line:
CS3100Prog2
Note that there are no spaces in the subject lines given. It is important
that you do not insert any spaces. My e-mail address is:
john@ishi.csustan.edu.
DUE DATES:
For the due dates, see
the class schedule.
QUESTIONS:
Please bring up in class any questions you still have about this assignment.