(Last Revision: 11/03/2005)
Family Tree Program
Binary Trees, Multi-linking, Command Processing, I/O
THE ASSIGNMENT:
Let's pretend that the residents of a small town decide that they want to have
a database that keeps track of information about their family trees. They
plan to collect daily requests for reports and changes. Requests are to be
processed in batches at the end of each day.
The database must be implemented as a binary search tree in which the ordering
goes by last name first and first name second. To simplify matters, we will
work under the assumption that two different persons never have both the same
first name and the same last name. Family relationships must be represented
by using (extra) pointers from node to node.
You must use an "array-based representation" of the binary search tree. The
basic idea of such a representation is described in your textbook starting on
page 516. The array representation uses integer fields as pointers. This
implementation will facilitate the moving of copies of the database to and
from secondary storage.
You must write a program that implements the database. Commands will come
from standard input to do such things as load the database from a file, make
changes, generate reports, and store the current database to a file.
INPUT:
Input Format:
The program will have to be able to read a file containing a version of the
database and make a copy of that information in primary memory. We will need
a name for such files. Let's call them treefiles.
The COMMANDS from standard input and the information in treefiles must be in a
specific format. I will reject your program if it is not capable of using
input or treefiles that I create for it in advance, or if it stores a treefile
in one run but is not able to load that treefile properly on a subsequent run.
I will provide you with some sample input for your program. You must test
your program on the samples. You must also test your program fully on your
own inputs. You must create good representative sets of average, unusual, and
boundary cases of input.
Legal input from standard input consists of a series of COMMANDS. Each command
is separated from the next by one or more white space characters (blanks,
tabs, and/or end-of-line characters).
A command is a series of one or more ELEMENTS. Each element is also separated
from the next by one or more white space characters. The first element in a
command is a KEYWORD. The rest of the elements, if there are any, are
PARAMETERS.
An element is an uninterrupted sequence of one or more CHARACTERS (a string).
Upper-case letters, lower-case letters, and digits are the only types of
characters that occur in elements.
There may or may not be white space before the beginning of the first command
and/or after the end of the last command.
Here is an (extreme) example consisting of two separate commands:
Insrt
kAne John DELET
smith MaRy
The two commands are Insrt Kane John and Delet Smith Mary.
It is NOT a good idea to create input that has such irregular spacing as the
example. I insist that you write the program to accept this 'free' format
because people who create input shouldn't be obliged to follow a format that
is too rigid. That causes errors and lost time. Users have the discipline to
produce neat input, but they aren't machines, and so they should be given some
freedom in how they use white space and case. You may have noticed that the
command interpreters that you work with as you type commands on Unix, Windows,
or Macintosh computers give you quite a bit of freedom to format as you wish.
With our formatting rules, a command does not have to begin or end at any
particular place on a line, and wherever white space occurs in the file, there
is no upper limit to the amount of it there may be.
A treefile must be formatted similarly. The difference is that a treefile is
a series of TREE NODE RECORDS, not a series of commands. Each tree node
record consists of a series of 10 FIELDS:
- node number,
- last name,
- first name,
- left pointer,
- right pointer,
- mother pointer,
- father pointer,
- maternal sibling pointer,
- paternal sibling pointer, and
- child pointer.
The fields of every tree node record must appear in exactly the order listed
above.
Parsing Input:
Your program is required to read all input as CHARACTER DATA, whether the
input is read from standard input or from a treefile. This means the program
has to read all input into variables of type char, array of char, or string.
The program must not read any input into int, float or other numeric
variables. (Of course, after the input is read, the program may convert
character data into numeric data and use numeric data internally.)
Your program must be case insensitive. In other words it should behave as if
there is no difference in meaning between the uppper-case and lower-case
versions of any letter. For example, a command element or field of a tree
node record written in upper-case letters will be understood just the same as
if written in lower-case letters, or if written in some combination of
upper-case and lower-case.
Usually the easiest way to make case insensitivity work in a program is to
convert all lower-case letters to upper-case (or all upper to lower) soon
after they are input.
Sample Commands:
Here is a series of a few sample commands, each preceded by a description of
what the command does (Assume the commands are executed in the order given.
The descriptions between /* */ markers are not part of the commands.):
/* Searches for a node with name John Kane in the data base.
If it is not there, inserts a node with that name. */
INSRT KANE JOHN
/* Inserts Rebecca Smith if she is not there.
(Note the mixed cases and extra blanks.) */
inSrt Smith rebeCca
/* Puts this information in database:
Rebecca Smith is the mother of John Kane. */
Mothr Smith Rebecca kane john
/* Inserts Ruby Powell in the database, if not there. */
Insrt Powell ruby
/* Puts this information in the database:
Ruby Powell and Rebecca Smith are maternal siblings.
(This means that they have the same mother. If they have the same
father too, that fact can be recorded with the PSIBL command.) */
MSIBL powell ruby smith rebecca
Sources of Input:
Your program will read a series of commands from standard input, and may also
do some reading from a file named: treeFile.old.
The
file [ treeFile.old ] can only be read as part of the execution of
the LOADT command (see below). Make sure that the program is set up
to read from a file with EXACTLY that name: [ treeFile.old ]. The
spelling must be the same. The use of upper and lower case must be the same.
(The program has to tell the operating system the name of the file to open.
The operating system may be case-sensitive with respect to filenames.
Therefore it is very important that the filename be reproduced EXACTLY: [
treeFile.old ] ).
Since you will usually want to carefully construct a series of commands to
input to the program before you execute the program, you should normally run
the program with the standard input redirected to an input file containing
your commands. When you do this, there is no possibility for you to interact
with the program via the keyboard. That is an acceptable and even desirable
way for this particular program to work.
COMMAND LIST:
A complete list of the commands appears below, but first we
need to define some notation:
<lastname> means a last name is typed here (30 letters or less).
<firstname> means a first name is typed here (30 letters or less)
<levels> means an integer from 1-6 is typed here (Remember the
program has to input all integers as character data.
Convert to integer data internally.)
Now here below is the list of commands. The comments are NOT
part of the commands.
/* Load the database from file treeFile.old */
LOADT
/* insert this person in the database if not there. */
INSRT <lastname> <firstname>
/* The first person is the mother of the second person. */
MOTHR <lastname> <firstname> <lastname> <firstname>
/* The first person is the father of the second. */
FATHR <lastname> <firstname> <lastname> <firstname>
/* The first and second persons have the same mother. */
MSIBL <lastname> <firstname> <lastname> <firstname>
/* The first and second persons have the same father. */
PSIBL <lastname> <firstname> <lastname> <firstname>
/* The first and second persons have both the same mother and
the same father. */
BSIBL <lastname> <firstname> <lastname> <firstname>
/* If in the database, delete this person. */
DELET <lastname> <firstname>
/* Print a list of the known siblings of this person. */
SIBLS <lastname> <firstname>
/* Print a list of the known children of this person. */
CHLDN <lastname> <firstname>
/* Print the search tree rooted at the node corresponding to
this person. */
STREE <lastname> <firstname> <levels>
/* Print the known family tree of this person. */
FTREE <lastname> <firstname> <levels>
/* Store the database in file treeFile.new */
STORE
/* Stop the program. */
FINIS
PROGRAM ACTIONS & OUTPUT:
The program does whatever is necessary to initialize. Then it reads from the
standard input and executes the commands it finds there, one after another,
until it reads a FINIS command, which causes the program to halt. You can
redirect standard input to a file from the Unix command line. Example: If the
executable version of your program is named famTree.out, then the
command
famTree.out < inFile3 > outFile7
causes the program to read its standard input from the file named
inFile3, and causes it to write its standard output to the file named
outFile7.
The program does all writing to standard output or to a treefile. The
treefile it uses for output must be named
treeFile.new
Each time the program reads a command from standard input, it must echo the
exact text of the command to standard output. However, you don't have to echo
the white space exactly. You can just separate each string with a single
blank. For example,
FTREE BroWN BILly
may be echoed as:
FTREE BroWN BILly
It will be very useful to have such echoing done while you are testing and
debugging. Naturally I will also find it useful during my testing.
If the command in question results in any changes whatever being made to the
database, then a message explaining what those changes were must be printed
next. The use of such messages is another very useful testing and debugging
aid. Messages must include changes made to pointers, as well as changes like
additions or deletions of nodes. If the command results in a large number of
changes, the message must summarize what has been done without explicitly
listing each change. Be sure to design the code that writes the messages at
the same time you are designing the code that carries out the actions. That
will help you to coordinate both activities correctly.
After the outputs described above have been done, any output that has been
explicitly requested by the command will be written. In the case of the
STORE command, the explicitly requested output goes into a treefile.
For all other commands, it goes to standard output.
The output that results from consecutive command lines from standard input
must be clearly separated on the standard output, perhaps by dotted lines or
series of asterisks. (NOT just by blank lines.)
You must clearly label all the major parts of the output on standard output,
and generally organize the display to be highly readable and attractive. You
are free to decide upon the details of how the standard output file is
displayed, as long as you do it well.
The output from the STORE command that goes into
treeFile.new must be written in the format of a treefile. (See the
section above entitled INPUT.) It must also be written in a manner that makes
it highly readable.
COMPLETE COMMAND DESCRIPTIONS:
Here are descriptions of how the program obeys each command:
LOADT
This command will appear, if at all, only as the FIRST COMMAND in the input
file. The effect of the command will be that the data in treeFile.old will be
used to make a copy of the database in RAM, complete with the structure
embodied in all its pointer fields. This will be accomplished by copying the
data in treeFile.old into an array of structures. (You may implement the
structures as instances of a struct or class).
The node number field in each respective tree node record contained in
treeFile.old will be used as an index into the array, and the information in
the other fields of the node record will be copied into the array element
determined by the index.
The first node record in treeFile.old will be assumed to be that of the tree's
root, so that the external root pointer for the tree must be set equal to the
first field in treeFile.old.
The <last name> and <first name> fields of the tree node records
will be strings of 30 letters or less. The other fields will all be integers
in the range [0 .. arrayMax-1]. You may assume that my test files will not
require your arrayMax value to be higher than 50.
INSRT <lastname> <firstname>
This command initiates a search through the tree for a node containing the
name given. The algorithm used must be the standard search algorithm for
binary search trees. If no such node is found, a new node is created for the
named person, and it is inserted into the search tree in the proper location
-- again, using the standard method for inserting a node into a binary search
tree.
Since nodes have to be created somehow, you will have to devise a scheme for
allocating unused array elements for new nodes. A desirable way of doing this
is to maintain a linked list of unused array elements. Let's say you call it
AVAIL. When a new node is needed, it is taken from AVAIL, and when a node is
deleted from the tree, it is returned to AVAIL.
MOTHR <lastname> <firstname> <lastname> <firstname>
This command will make the mother pointer in the second person's node point to
the first person's node, indicating that the first person is the mother of the
second person.
The command will also check the MATERNAL SIBLING LIST of the second person's
node (the child node) and make the mother pointers of all the maternal
siblings point to the mother node. Besides that, if the LIST OF CHILDREN of
the mother node is not empty, the command will merge that list with the
aforementioned maternal sibling list. (You are not required to keep sibling
lists or lists of children in any particular order.)
If the children pointer of the mother node is not yet pointing to any node,
then the command will point it to the child node.
(Here is a definition of a MATERNAL SIBLING: Person X is a maternal
sibling of person Y if and only if X and Y have the same mother, and X
and Y are not the same person.
The maternal sibling list attached to a person's node is a circular list
of search tree nodes, linking together all known maternal siblings of the
person. If there are no known maternal siblings, the list just contains
the one person's node.
The list of children of a mother node is also a circular linked list of
search tree nodes. This list could be empty. If not empty, its elements
are linked into a ring by use of their maternal sibling pointers and the
child pointer of the mother node points to one of the members of this
ring.
A list of children of a (female) node is a maternal sibling list.
However a maternal sibling list can be "motherless". This depends on how
much information has been fed into the database about the familial
relationships among the various nodes.
For example, It is possible to know that two people have the same mother,
but at the same time to NOT know the identity of the mother. In this
case, a list of maternal siblings is not connected to a mother node.)
FATHR <lastname> <firstname> <lastname> <firstname>
This command is just like the MOTHR command desrcribed above, except it works
with father pointers, rather than mother pointers, and the list of children
emanating from the father node is continued via paternal sibling pointers,
rather than maternal sibling pointers. Paternal siblings, of course, are
people who have the same father.
MSIBL <lastname> <firstname> <lastname> <firstname>
This command records the fact that the first and second persons have the same
mother. It will merge the maternal sibling lists of the two people. If one
of the nodes, call it X, has a mother pointer that points to a node M, and the
other node, call it Y, has a mother pointer that does not point to a node,
then the command will make the mother pointers of all the nodes in the
maternal sibling list of Y (including Y) point to M.
PSIBL <lastname> <firstname> <lastname> <firstname>
This command is just like the MSIBL command above, except it records that the
first and second persons have the same father. This command changes paternal
sibling pointers instead of maternal, and father pointers instead of mother
pointers.
BSIBL <lastname> <firstname> <lastname> <firstname>
This command has the effect of executing the two commands:
MSIBL <lastname> <firstname> <lastname> <firstname>
PSIBL <lastname> <firstname> <lastname> <firstname>
DELET <lastname> <firstname>
If the indicated person is present in the database, this command deletes him
or her. Deletion has to be done in such a way that ALL pointers to the
indicated node are appropriately redirected, or nulled. The root pointer,
left pointers, right pointers, mother pointers, father pointers, maternal
sibling pointers, paternal sibling pointers, and child pointers all have to be
considered, and fixed if any of these kinds of pointers point to the node in
question.
To take care of root, left, and right pointers, one of the
standard algorithms for deleting a node from a binary tree must be used.
CAREFUL: I'll discuss in class the nature of the algorithm you need to use.
If the deletion from the binary search tree is done correctly, the rest of the
work will be relatively simple, We will discuss this in class. (You will also
find some discussion of this below.)
SIBLS <lastname> <firstname>
This command will cause a list of the known siblings of the indicated person
to be written to the output file. All maternal siblings will be listed first,
and then all paternal siblings. Each list must be appropriately labelled on
the output. Note that in order to make this command work properly, the list
needs to be linked in a manner that makes it possible to start anywhere in the
list, and follow pointers to arrive at any other element of the list. Of
course the ring structure specified above suits this purpose.
CHLDN <lastname> <firstname>
This command will cause a list of the known children of the indicated person
to be written to the output file.
STREE <lastname> <firstname> <levels>
This will cause the program to print a picture of the sub-tree of the SEARCH
tree rooted at the indicated person's node, down to the number of levels
indicated. A reverse in-order traversal can be used as the basis of the
algorithm. We can discuss the implementation of this command further in
class.
The information printed for each node will be the first ten letters of the
first and last names. (First name first, on a line by itself. Then last name
underneath, on a line by itself.)
FTREE <lastname> <firstname> <levels>
This will cause the program to print a picture of the sub-tree of the FAMILY
tree rooted at the indicated node, down to the number of levels indicated.
You will use the same algorithm to do this as is used by STREE. You must use
ONE PROCEDURE to print both kinds of tree.
STORE
This command will cause the program to write its current version of the
database to the file treeFile.new.
The file treeFile.new must be written in the correct format. See the
subsection titled Format above, in the main section titled INPUT. The file
must also be laid out to maximize its readability to humans. The first node
record written into treeFile.new must be that of the root. Naturally, the
treeFile.new written by STORE must be readable by LOADT, provided of course
that we first change its name to treeFile.old.
A good way to implement STORE would be to do a pre-order traversal of the
tree, and to write the information corresponding to each node as the node is
visited.
FINIS
This command halts the program. A parting message is printed to the standard
output by this command.
DETAILS AND ASSUMPTIONS
Your program is not required to do any error detection or correction. It may
assume that there are no mistakes of any kind in the input file and no
mistakes of any kind in treeFile.old. In particular, no errors in the form or
order of the commands or fields, no attempts to record contradictory familial
relationships, no attempts to record relationships that are already recorded
in the database, and no attempts to access information that does not exist in
the database.
DATA STRUCTURES
You are required to use a binary search tree as the main data structure. You
must also use linking with pointers to keep track of the relationships, both
familial and alphabetical, among the elements of the database.
The tree must be implemented as an array of records. Each record will
represent one node of the tree. A record must contain these 7 pointers: left
pointing, right pointing, mother pointing, father pointing, maternal sibling
pointing, paternal sibling pointing, and children pointing. The pointers must
be implemented as integers, and the address a pointer points to will simply be
the array element whose index is the value of the pointer. A value not in the
array's index range can be used to denote a nil pointer.
I strongly recommend that you organize the 7 pointers as an array, indexed by
an enumerated type, with elements such as (leftL, rightR, mother, father,
msibling, psibling, children). This will allow you to, in effect, pass the
names of the pointers as parameters to procedures, while still retaining a set
of mnemonic names for the pointers. This will make it easy to, for one
example, write a single procedure which can be used to print either family
trees or search trees. You will also find it easy to create a single
procedure to implement both the MOTHR and FATHR commands, and a single
procedure to implement both the MSIBL and PSIBL commands. [IMPORTANT NOTE: the names "left" and "right" are used in one
of the C++ libraries, so DO NOT name anything in your program "left" or
"right", else you may not be able to compile the program.]
You must design the other data structures and the structure of your program.
There are several choices to be made. These choices will determine how
efficiently your program will run, and how simple or complicated the code will
be. If you try, you will find many ways to save time by avoiding duplicating
code, and by using the right tool (data structure) for the right job.
It is important that you design your program well. You have been taught some
clever and elegant ways of doing some of the things your program needs to do.
now you must think of some equally clever and elegant ways to do the rest. If
you take this advice, it will make your job much easier.
Like good literature, a program should have power, and economy of expression.
It should do a lot with very few lines of code, and go like the dickens. (pun
intended.)
You are answerable for the wisdom and efficacy of your choice of design. Use
C++ classes to implement data structures. Keep clear boundaries between
different data types.
HELPFUL HINTS ON THE FAMILY TREE PROBLEM
- When implementing DELET, you must check for every possible kind of
pointer that might be pointing to the node to be deleted.
Let's say that the node to be deleted is called X. The list of pointers
that might point to X consists of left pointers, right pointers, mother or
father pointers, msibling pointers, psibling pointers, child pointers,
root pointer, and any additional pointers that you might decide to use.
Any pointer pointing to X must be redirected somehow by DELET.
Left, right, and root are handled by a standard deletion algorithm for a
binary search tree.
To fix mother or father pointers which may be pointing to X, follow the
child pointer of X. If it points to some node Y, find out whether X is
the mother of Y or the father of Y (it has to be one or the other). Then,
based on this information, traverse the appropriate sibling ring of Y, and
null out the appropriate parent pointer in each node visited.
To fix child, msibling, and psibling pointers we can begin by following
the mother pointer of X and observing if there is a mother node whose
child pointer points back to X. If so, we either change the child pointer
of the mother node so that it points instead to a maternal sibling of X,
or we nullify it if X has no known maternal siblings. We then delete X
from its maternal sibling ring. Next do these same tasks on the paternal
side.
- When deleting a node X from the binary search tree, you have to be
careful if X has two children. The standard algorithms are of two
types:
One type copies the info in Y, the leftmost node of the right subtree of
X, into the info area of X, and deletes Y from the tree.
The other type actually disconnects X from the tree, disconnects Y from
the tree, and then reconnects Y in the former position of X. Here
DISCONNECT and RECONNECT refer only to the changes made to root, left, and
right pointers -- not mother, father, and so forth.
It is probably best to use the disconnect-reconnect method, because that
way the mother, father, and other pointers that point to Y do not have to
be redirected.
- The AVAIL list is the list of nodes not in use. You must have procedures
to simulate the new (memory allocation) and delete
(deallocation) procedures by deleting and inserting from and into the
AVAIL list.
- You have to be careful to take into consideration how the actions of the
LOADT command may affect the AVAIL list. If a LOADT command is executed
after the AVAIL list is initialized then, unless treeFile.old is empty,
the LOADT command will copy tree node data into array slots that are on
the AVAIL list. There are two basic ways to handle this situation:
- The program checks to see if there is a LOADT command in the input
and if so, executes the LOADT command before it initializes
the AVAIL list.
- The program initializes the AVAIL list before executing any of the
commands. If there is a LOADT command, the program
re-initializes the AVAIL list after copying all the tree node
records into the array.
After a LOADT command is executed the AVAIL list must contain all the
array slots that are not occupied by the tree nodes that the LOADT
command placed into the array. The LOADT command could do the following:
- First mark all the array slots "unused".
- Next copy each tree node into its designated location in the array,
taking care to mark each such location "used",
- Finally, starting with an empty AVAIL list, traverse the array and
insert each unused array slot into the AVAIL list.
Note: The Kennedy family tree is good data for testing.
There is a nice Kennedy Family Tree at a this
PBS site
WHAT TO TURN IN:
You will turn in three "phases" of this assignment:
- a level 3 version, and
- a level 4 version, and
- a final version.
For each phase of the assignment, you will turn in a printer output (hardcopy)
and you will send me an e-mail message. Please follow these rules:
- Always send me e-mail as plain text in the main message body.
Never send me attachments.
- Always use the exact subject line I specify for each
message. (I often get hundreds of e-mail messages in a week. The
subject line allows me to find, filter and sort messages.) You will lose
a significant number of points on the assignment if you use the wrong
subject line.
- Be very careful when you send the e-mail. You may use the
instructions in your
Hello World! lab excercise
for guidance. Of course, you will need to make the obvious changes to
those directions -- you have to use the correct subject line and
filename.
- Always send yourself a copy of each e-mail message you send to me,
and check immediately to see if you receive the message intact.
You are responsible for sending
e-mail correctly.
Here is the list of things you have to turn in:
- At the start of class on the
first due date,
place the following item in front of me:
- a hardcopy of your level 3 (or
greater) program. (All the source code, i.e. all the *.h and *.cpp
files) Make sure all the code is properly formatted and that it all
shows on the paper.
- Using the subject line: CS3100,prog4.3 send the following item
to me by e-mail before midnight on the
first due date:
One
shell archive file
(only one) containing items 1-4.
- All source files for your level 3
program (everything I will need to compile and run it:
all *.cpp files and *.h files)
- Your test script showing adequate testing of your level 3 program.
- A file named 'README' containing the compilation command one should
use to compile your program.
- A copy of your 'makefile' if you used one.
- At the start of class on the
second due date,
place the following item in front of me:
- a hardcopy of your level 4 (or
greater) program. (All the source code, i.e. all the *.h and *.cpp
files) Make sure all the code is properly formatted and that it all
shows on the paper.
- Using the subject line: CS3100,prog4.4 send the following item
to me by e-mail before midnight on the
second due date:
One
shell archive file
(only one) containing items 1-4.
- All source files for your level 4
program (everything I will need to compile and run it:
all *.cpp files and *.h files)
- Your test script showing adequate testing of your level 4 program.
- A file named 'README' containing the compilation command one should
use to compile your program.
- A copy of your 'makefile' if you used one.
- At the start of class on the
third due date,
place the following item in front of me:
- a hardcopy of your final level
program. (all the source code) Make sure all the code is properly
formatted and that it all shows on the paper.
- Using the subject line: CS3100,prog4.f send the following item
to me by e-mail before midnight on the
third due date:
One
shell archive file
(only one) containing items 1-4.
- All source files for your final level
program (everything I will need to compile and run it:
all *.cpp files and *.h files)
- Your test script showing adequate testing of your final level program.
- A file named 'README' containing the compilation command one should
use to compile your program.
- A copy of your 'makefile' if you used one.
Note that there are no spaces in the subject lines given. It is important
that you do not insert any spaces. My e-mail address is: john@ishi.csustan.edu.
DUE DATES:
For due dates, see
the class schedule.