(Latest Revision: 02/27/2003)
(02/27/2003 -- revised description of how to count "swaps")
(03/10/2003 -- decided not to count "swaps" but to count "moves")
Comparing Sorting Algorithms
Write a program that REPEATEDLY:
- prompts the user to enter a filename,
- opens the file,
- reads a comment from the first line,
- reads an integer N from the second line (e.g. N = 1000),
- reads N integers from the rest of the file, putting them
into consecutive locations in an array,
- writes the name of the input file,
- writes the comment from the input file,
- sorts an identical copy of the array using each of the following methods:
- selection sort
- insertion sort
- quick sort
- merge sort
- writes a report of the number of compares and moves done in each sort,
and
- asks the user if s/he wishes to continue
UNTIL the user wishes NOT to continue.
MOTIVATION:
Working on this program and actually experiencing the differences in the
amount of work and the amount of runtime is a nice way to get familiar with
analysis of algorithms. This is an important part of the "science" in
computer science.
DETAILS:
Your program must make a little announcement (examples to follow) just
before starting and just after completing each run of one of the four
sorting function. This will allow you to get a sense of how long each
individual algorithm takes to do its work.
Your program must pass in a new copy of the original array
each time it calls a sorting function. We don't want the first function
call to be the only one that really sorts the original array!
When you count "moves" count the number of times that an element of the
list is copied from one location to another (with an assignment statement).
Each call to the function swap counts as three moves.
When counting compares, count only comparisons of list elements (the
things that are being sorted), not the array indices. For example,
if the array you are using is named "data" the line of code:
if (data[j] < data[j - 1])
contains a comparison that you would have to count, because it is a
comparison of two elements in the list. On the other hand, the line of
code:
while (i < j)
does not compare elements of the list. It is a comparison of array indices
only. Therefore you must not count that comparison.
The line of code
for (;(loc > 0) && (data[loc-1] > nextItem)
contains one comparison that you should not count and one comparison
that you should count.
You should not count "loc > 0" because it compares indices.
You should count "data[loc-1] > nextItem" because it
compares a list element with (a copy of) another list element.
Keep in mind that it may be a little "tricky" to insert the code that
counts the comparisons executed in if-clauses and loop conditions. Give the
problem due thought and consideration.
The code you need to perform the sorting is contained in a set of files in
the sub-directory called Src.
- swap.cpp
- swap.h
- insertionSort.cpp
- insertionSort.h
- selectionSort.cpp
- selectionSort.h
- quicksort.cpp
- quicksort.h
- mergesort.cpp
- mergesort.h
You will need to modify this code so that it takes care of the counting of
the compares and the moves. That will mean adding parameters to
functions and adding pieces of code that do counting within functions. We
can discuss more details of this in class.
I also included a little prototype of a driver program called
driver.cpp You can get ideas from driver.cpp for
formulating the program you do for this assignment.
To do some preliminary testing, you can compile the whole "package" of
driver plus sorting code on the SUN Ultra's with this command:
ng++ *.cpp
This assumes that the only source files in your current working directory
are the ones that are part of the "package." Of course, to complete the
assignment you will need to write another driver that is fully functional.
I tested the combination of driver and sorting code on a SUN Ultra and
everything seems to work fine.
In your driver you must incoporate the four sorts I have given you in the
manner I described. If you want your driver to call other sorts in addition
to the four I assigned above, go ahead. You will have to furnish the code
for the additional sorting routines.
SAMPLE INPUT AND OUTPUT FILES:
Let's look at two sample input files and the corresponding output files in
order to better understand what the format of the files has to be.
The sample input files are
ord50
and
ran1000
The corresponding sample output files are:
ord50.out
and
ran1000.out
Note: When you see a message such as "selection sort starting ... done"
understand that the program first writes "selection sort starting ... "
then performs the selection sort, and then writes
"done."
Also allow me to tell you now in case you ask: I made up the values for
compares and moves in the sample output files. They are not necessarily
the actual numbers you are supposed to get if you use that file as an input
to your program. (Nevertheless you are obliged to write a program that
gets correct numbers!)
TESTING:
Lists can be small, medium, and large in size. They can also be random,
ordered, or reverse-ordered. You need to test all the possible
combinations (there are 9).
When you test your program on large lists it helps you appreciate
the efficiency advantages of the advanced sorts. Computers are getting
faster and faster! I felt I had to choose lists of size 30,000 for your
large sets. A few years ago lists of size 5,000 probably would have been
quite large enough.
It is very important that you test all the data sets I have indicated. It
will determine what you learn from this assignment (and will also determine
a large fraction of your grade). I put a package containing all the inputs
you need here (but don't click until you are ready to download):
ftp://www.cs.csustan.edu/pub/john/dataFiles.tar.gz
In the header of your program, include a paragraph or two describing what
you learned from writing and testing this program. Do your very best to
make this well-written. This will be another aspect of the assignment that
will count heavily.
While making the script for this program do not cat the input files
onto the screen! This would be the normal procedure if the input files
were of manageable size, but most of the input files we are using in this
assignment are way too long! If you write the program according
to the specifications I gave you, then the information printed by the
program will tell me what I need to know.
WHAT TO TURN IN:
You will turn in your assignment by e-mail.
With regard to the e-mail please follow these rules:
- Always send me e-mail as plain text in the main message body. Never
send me attachments. If you want to send e-mail from home just login
remotely to one of the CS suns and do the kind of e-mail command I
showed you in the hello-world exercise.
- I will tell you what subject line to use with each message, and I need
you to use exactly the subject lines I give you. (I get
hundreds of e-mail messages at a time and your subject line allows me
to sort messages.) I will take points off if the subject line is
wrong.
Turn in one
shell archive file
(only one) containing all things listed under 1-3 below.
- Your final versions of these files that I gave you:
- selectionSort.cpp
- selectionSort.h
- insertionSort.cpp
- insertionSort.h
- quicksort.cpp
- quicksort.h
- mergesort.cpp
- mergesort.h
- swap.cpp
- swap.h
- Your fully functional driver program, including your writings about
what you learned,
- Your script showing all the test runs on the input files I gave you.
Notice that I'm asking you to send me all the files I need to compile the
program, whether you changed them or not. If I have to add or take away
files or fix the format to get the program to compile or work properly then
I will take off points.
Make sure the script you send is cleaned up using the "col -b" filtering
trick. Make sure all your source files are clean too. Don't send me any
compiled code.
E-mail me the archive file with the subject line:
CS3100,prog2
Note that there are no spaces in the subject lines given. It is
important that you do not insert any spaces. My e-mail address
is:
john@ishi.csustan.edu.
DUE DATES:
For the due dates, see
the class schedule.
QUESTIONS:
Please bring up in class any questions you still have about this assignment.