(Latest Revision: 09/17/2015)
Comparing Sorting Algorithms
Write a program that REPEATEDLY:
- prompts the user to enter a filename,
- opens the file for reading,
- reads a comment from the first line of the file,
- reads an integer N from the second line (e.g. N = 1000),
- reads N integers from the rest of the file, putting them
into consecutive locations in an array,
- writes the name of the input file,
- writes the comment from the input file,
- sorts an identical copy of the array using each of the following methods:
- selection sort
- insertion sort
- quick sort
- merge sort
- writes a report of the number of compares and moves of list elements
done in each sort, and
- asks the user if s/he wishes to continue
UNTIL the user wishes NOT to continue.
MOTIVATION:
Working on this program and actually experiencing the differences in the
amount of work and the amount of runtime is a nice way to get familiar with
analysis of algorithms. This is an important part of the "science" in
computer science.
DETAILS:
Your program must make a little announcement (examples to follow) just before
starting and just after completing each run of one of the four sorting
functions. This will allow you to get a sense of how long each individual
algorithm takes to do its work.
Your program must pass in a new copy of the original array
each time it calls a sorting function. We don't want the first function
call to be the only one that really sorts the original array!
When you count "moves" count the number of times that an element of the list
(or a copy of an element of the list) is copied from one memory location to
another memory location (with an assignment statement). The
C++ swap function included with the assignment source code
performs 3 moves.
When counting "compares", count only comparisons of list elements or copies of list elements. Do not count comparisons of array indices. For example,
if the array you are using is named "data" the line of code:
if (data[j] < data[j - 1])
contains a comparison that you would have to count, because it is a
comparison of two elements in the list. On the other hand, the line of
code:
while (i < j)
does not compare elements of the list. It is a comparison of array indices
only. Therefore you must not count that comparison.
The line of code
for (;(loc > 0) && (data[loc-1] > nextItem)
contains one comparison that you should not count and one comparison
that you should count.
You should not count "loc > 0" because it compares indices.
You should count "data[loc-1] > nextItem" because it
compares a list element with (a copy of) another list element.
Keep in mind that it may be a little "tricky" to insert the code that
counts the comparisons executed in if-clauses and loop conditions. That's
part of the challenge of this programming assignment. You need to think
about how to use C++ features to solve the problem. Give it due thought
and consideration. We can discuss hints in class.
When you make your program, you may use
This source code. If you put all that code in a directory, and then cd to the directory and type the command
g++ *.cpp
That will compile everything into one executable a.out program. Re-write the sample driver.cpp file found with the source code. By modifying the driver.cpp file you can do most of the work required to complete the assignment. The same compile command as the one given above will work as you develop and test your program.
We can discuss some of the technical problems in class. As part of that discussion we
can look at the driver.cpp code together to get an understanding of the C++ features
it illustrates.
For this assignment, I much prefer, for various reasons, that you write the program in C++. If you don't want to do that, please speak to me about it immediately.
You will need to modify the code in the sorting modules so that
it takes care of the counting of the compares and the moves.
I made parameters for the sort functions that you can use for this
purpose. I'll go over some of the details of the requirements with you
in class. Basically, to implement the counting, you will need to add
some variables to your version of driver.cpp, do some appropriate initializations
and re-initialization of those variables, and include their values in certain
outputs.
SAMPLE INPUT AND OUTPUT:
Let's look at two sample input files and the corresponding output in order to
better understand what the format of the files and outputs has to be.
The sample input files are
ord50
and
ran1000
The corresponding sample program output:
sample.txt
Note: It is very important that you understand that when you see a message
such as
selection sort starting ... done
what happened was this: first the program wrote "selection sort starting ... "
then the program began performing the selection sort, and
then (after finishing the selection sort) the program wrote
"done."
So, if you pay attention to the elapsed time between writing "selection sort
starting ..." and writing "done", you will get an idea of how much time it
took to perform that selection sort.
TESTING:
Lists can be small, medium, or large in size. They can also be random,
ordered in the sort order, or ordered in the reverse of the sort order.
I am requiring that you test nine combinations of those characteristics.
When you test your program on large lists, it helps you appreciate
the efficiency advantages of the advanced sorts. Computers are getting
faster and faster! I felt I had to choose lists of size 50,000 for your
large sets. Several years ago lists of size 5,000 probably would have been
quite large enough.
It is very important that you test all the data sets I have indicated. It
will determine what you learn from this assignment (and will also determine
a large fraction of your grade). I put a set of data files you can use
in a subdirectory of the assignment directory.
In the header comment of your program, include a paragraph or two describing what you
learned from writing and testing this program. In other words,
how did the speed and the values of the counts compare among
the different algorithms? Do your best to write this well.
Be articulate. This will be another aspect of the assignment
that will count significantly.
While making the test script for this program do not cat the
input files onto the screen! This would be the normal procedure if the input
files were of manageable size, but most of the input files we are using in
this assignment are way too long! If you write the program according
to the specifications I gave you, then the information printed by the program
will tell me what I need to know.
WHAT TO TURN IN:
Since this assignment does not require you to do a large amount of program
design work, I will not require you to turn in a preliminary version.
You will turn in one (set of) printer output(s) (hardcopy) and you will send me one
e-mail message that contains a so-called shell archive. Please follow these rules:
- Always send me e-mail as plain text in the main message body.
Never send me attachments.
- Always use the exact subject line I specify for each
message. (I often get hundreds of e-mail messages in a week. The
subject line allows me to find and sort messages.) You will lose a
significant number of points on the assignment if you use the wrong
subject line.
- Be very careful when you send the e-mail. You may use the
e-mail instructions in the
Hello World! lab exercise
for guidance. Of course, you will need to make the obvious changes to
those directions -- you have to use the correct subject line and
filename.
- Always send yourself a copy of each e-mail message you send to me,
check immediately to see if you receive the message intact, and
check within a few minutes to see if you have received e-mail notifying
you about an undeliverable message. You are
responsible for sending e-mail correctly.
Here is the list of things you have to turn in:
- At the start of class on the due date
place hardcopies of your final versions of the following files
on the "counter" in front of me.
- driver.cpp,
- selectionSort.cpp,
- insertionSort.cpp,
- quicksort.cpp, and
- mergesort.cpp
All the hardcopy material above should be attached or stapled together.
- Before midnight on the the due
date send me the following by e-mail:
A
shell archive file
containing this collection of files:
- A complete set of all your program source code files
(which includes, in your driver.cpp file, your writings
about what you learned),
- Your script showing all the test runs on the input files I gave you,
- A file named 'README' containing the compilation command
one should use to compile your program, and
- A copy of your 'makefile' - if you used one.
Remember:
- Within your shar file, you must send all the files needed to
compile the program. That includes all the *.cpp and *.h files.
- Make sure your script is cleaned up using the sort of "col -b" filtering trick
illustrated here.
- No listings of any of your data files are to be included in your test script.
- Lines in all source code and script files are to be no more than 78-characters long.
Tab characters are not allowed. All content must be plainly readable
and properly formatted.
- Your source files must be clean too. Sometimes files transferred from a PC to a unix
(or linux) machine need to be
cleaned up.
- Don't send me any compiled code! No a.outs!!!
E-mail me the shell archive file with the subject line:
CS3100Prog2
Note that there are no spaces in the subject lines given. It is important
that you do not insert any spaces. My e-mail address is:
john@ishi.csustan.edu.
WHEN IS THIS ASSIGNMENT DUE?
Look for the due date in
the class schedule.
(It's at the top level of the class directory.)
QUESTIONS:
Please bring up in class any questions you still have about this assignment.