family tree assignment

(Last Revision: 11/03/2005)

Family Tree Program
Binary Trees, Multi-linking, Command Processing, I/O

THE ASSIGNMENT:

Let's pretend that the residents of a small town decide that they want to have a database that keeps track of information about their family trees. They plan to collect daily requests for reports and changes. Requests are to be processed in batches at the end of each day.

The database must be implemented as a binary search tree in which the ordering goes by last name first and first name second. To simplify matters, we will work under the assumption that two different persons never have both the same first name and the same last name. Family relationships must be represented by using (extra) pointers from node to node.

You must use an "array-based representation" of the binary search tree. The basic idea of such a representation is described in your textbook starting on page 516. The array representation uses integer fields as pointers. This implementation will facilitate the moving of copies of the database to and from secondary storage.

You must write a program that implements the database. Commands will come from standard input to do such things as load the database from a file, make changes, generate reports, and store the current database to a file.

INPUT:

Input Format:

The program will have to be able to read a file containing a version of the database and make a copy of that information in primary memory. We will need a name for such files. Let's call them treefiles.

The COMMANDS from standard input and the information in treefiles must be in a specific format. I will reject your program if it is not capable of using input or treefiles that I create for it in advance, or if it stores a treefile in one run but is not able to load that treefile properly on a subsequent run.

I will provide you with some sample input for your program. You must test your program on the samples. You must also test your program fully on your own inputs. You must create good representative sets of average, unusual, and boundary cases of input.

Legal input from standard input consists of a series of COMMANDS. Each command is separated from the next by one or more white space characters (blanks, tabs, and/or end-of-line characters).

A command is a series of one or more ELEMENTS. Each element is also separated from the next by one or more white space characters. The first element in a command is a KEYWORD. The rest of the elements, if there are any, are PARAMETERS.

An element is an uninterrupted sequence of one or more CHARACTERS (a string). Upper-case letters, lower-case letters, and digits are the only types of characters that occur in elements.

There may or may not be white space before the beginning of the first command and/or after the end of the last command.

Here is an (extreme) example consisting of two separate commands:


          Insrt
   kAne              John  DELET
      smith MaRy

The two commands are Insrt Kane John and Delet Smith Mary.

It is NOT a good idea to create input that has such irregular spacing as the example. I insist that you write the program to accept this 'free' format because people who create input shouldn't be obliged to follow a format that is too rigid. That causes errors and lost time. Users have the discipline to produce neat input, but they aren't machines, and so they should be given some freedom in how they use white space and case. You may have noticed that the command interpreters that you work with as you type commands on Unix, Windows, or Macintosh computers give you quite a bit of freedom to format as you wish.

With our formatting rules, a command does not have to begin or end at any particular place on a line, and wherever white space occurs in the file, there is no upper limit to the amount of it there may be.

A treefile must be formatted similarly. The difference is that a treefile is a series of TREE NODE RECORDS, not a series of commands. Each tree node record consists of a series of 10 FIELDS:

node number,
last name,
first name,
left pointer,
right pointer,
mother pointer,
father pointer,
maternal sibling pointer,
paternal sibling pointer, and
child pointer.

The fields of every tree node record must appear in exactly the order listed above.

Parsing Input:

Your program is required to read all input as CHARACTER DATA, whether the input is read from standard input or from a treefile. This means the program has to read all input into variables of type char, array of char, or string. The program must not read any input into int, float or other numeric variables. (Of course, after the input is read, the program may convert character data into numeric data and use numeric data internally.)

Your program must be case insensitive. In other words it should behave as if there is no difference in meaning between the uppper-case and lower-case versions of any letter. For example, a command element or field of a tree node record written in upper-case letters will be understood just the same as if written in lower-case letters, or if written in some combination of upper-case and lower-case.

Usually the easiest way to make case insensitivity work in a program is to convert all lower-case letters to upper-case (or all upper to lower) soon after they are input.

Sample Commands:

Here is a series of a few sample commands, each preceded by a description of what the command does (Assume the commands are executed in the order given. The descriptions between /* */ markers are not part of the commands.):

    /* Searches for a node with name John Kane in the data base.  
      If it is not there, inserts a node with that name. */

INSRT KANE JOHN

   /* Inserts Rebecca Smith if she is not there.
      (Note the mixed cases and extra blanks.) */

   inSrt Smith    rebeCca

   /* Puts this information in database: 
      Rebecca Smith is the mother of John Kane. */

Mothr Smith Rebecca   kane john

   /* Inserts Ruby Powell in the database, if not there.  */

Insrt Powell ruby

   /* Puts this information in the database:
      Ruby Powell and Rebecca Smith are maternal siblings.  
      (This means that they have the same mother.  If they have the same
      father too, that fact can be recorded with the PSIBL command.) */

MSIBL powell ruby smith rebecca

Sources of Input: Your program will read a series of commands from standard input, and may also do some reading from a file named:

 treeFile.old.

The file [ treeFile.old ] can only be read as part of the execution of the LOADT command (see below). Make sure that the program is set up to read from a file with EXACTLY that name: [ treeFile.old ]. The spelling must be the same. The use of upper and lower case must be the same. (The program has to tell the operating system the name of the file to open. The operating system may be case-sensitive with respect to filenames. Therefore it is very important that the filename be reproduced EXACTLY: [treeFile.old] ).

Since you will usually want to carefully construct a series of commands to input to the program before you execute the program, you should normally run the program with the standard input redirected to an input file containing your commands. When you do this, there is no possibility for you to interact with the program via the keyboard. That is an acceptable and even desirable way for this particular program to work.

COMMAND LIST: A complete list of the commands appears below, but first we need to define some notation:


<lastname>    means a last name is typed here (30 letters or less).

<firstname>   means  a first name is typed here (30 letters or less)

<levels>      means an integer from 1-6 is typed here (Remember the 
              program has to input all integers as character data.
              Convert to integer data internally.)

Now here below is the list of commands. The comments are NOT part of the commands.

/* Load the database from file treeFile.old */

LOADT

/* insert this person in the database if not there. */

INSRT <lastname> <firstname>

/* The first person is the mother of the second person. */

MOTHR <lastname> <firstname> <lastname> <firstname>

/* The first person is the father of the second. */

FATHR <lastname> <firstname> <lastname> <firstname>

/* The first and second persons have the same mother. */

MSIBL <lastname> <firstname> <lastname> <firstname>

/* The first and second persons have the same father. */

PSIBL <lastname> <firstname> <lastname> <firstname>

/* The first and second persons have both the same mother and the same father. */

BSIBL <lastname> <firstname> <lastname> <firstname>

/* If in the database, delete this person. */

DELET <lastname> <firstname>

/* Print a list of the known siblings of this person. */

SIBLS <lastname> <firstname>

/* Print a list of the known children of this person. */

CHLDN <lastname> <firstname>

/* Print the search tree rooted at the node corresponding to this person. */

STREE  <lastname> <firstname> <levels>

/* Print the known family tree of this person. */

FTREE <lastname> <firstname> <levels>

/* Store the database in file treeFile.new */

STORE

/* Stop the program. */

FINIS

PROGRAM ACTIONS & OUTPUT:

The program does whatever is necessary to initialize. Then it reads from the standard input and executes the commands it finds there, one after another, until it reads a FINIS command, which causes the program to halt. You can redirect standard input to a file from the Unix command line. Example: If the executable version of your program is named famTree.out, then the command

   famTree.out < inFile3 > outFile7

causes the program to read its standard input from the file named inFile3, and causes it to write its standard output to the file named outFile7.

The program does all writing to standard output or to a treefile. The treefile it uses for output must be named

treeFile.new

Each time the program reads a command from standard input, it must echo the exact text of the command to standard output. However, you don't have to echo the white space exactly. You can just separate each string with a single blank. For example,

FTREE    BroWN            BILly

may be echoed as:

FTREE BroWN BILly

It will be very useful to have such echoing done while you are testing and debugging. Naturally I will also find it useful during my testing.

If the command in question results in any changes whatever being made to the database, then a message explaining what those changes were must be printed next. The use of such messages is another very useful testing and debugging aid. Messages must include changes made to pointers, as well as changes like additions or deletions of nodes. If the command results in a large number of changes, the message must summarize what has been done without explicitly listing each change. Be sure to design the code that writes the messages at the same time you are designing the code that carries out the actions. That will help you to coordinate both activities correctly.

After the outputs described above have been done, any output that has been explicitly requested by the command will be written. In the case of the STORE command, the explicitly requested output goes into a treefile. For all other commands, it goes to standard output.

The output that results from consecutive command lines from standard input must be clearly separated on the standard output, perhaps by dotted lines or series of asterisks. (NOT just by blank lines.)

You must clearly label all the major parts of the output on standard output, and generally organize the display to be highly readable and attractive. You are free to decide upon the details of how the standard output file is displayed, as long as you do it well.

The output from the STORE command that goes into treeFile.new must be written in the format of a treefile. (See the section above entitled INPUT.) It must also be written in a manner that makes it highly readable.

COMPLETE COMMAND DESCRIPTIONS:

Here are descriptions of how the program obeys each command:

LOADT

INSRT <lastname> <firstname>

MOTHR <lastname> <firstname> <lastname> <firstname>

FATHR <lastname> <firstname> <lastname> <firstname>

This command is just like the MOTHR command desrcribed above, except it works with father pointers, rather than mother pointers, and the list of children emanating from the father node is continued via paternal sibling pointers, rather than maternal sibling pointers. Paternal siblings, of course, are people who have the same father.

MSIBL <lastname> <firstname> <lastname> <firstname>

This command records the fact that the first and second persons have the same mother. It will merge the maternal sibling lists of the two people. If one of the nodes, call it X, has a mother pointer that points to a node M, and the other node, call it Y, has a mother pointer that does not point to a node, then the command will make the mother pointers of all the nodes in the maternal sibling list of Y (including Y) point to M.

PSIBL <lastname> <firstname> <lastname> <firstname>

This command is just like the MSIBL command above, except it records that the first and second persons have the same father. This command changes paternal sibling pointers instead of maternal, and father pointers instead of mother pointers.

BSIBL <lastname> <firstname> <lastname> <firstname>

DELET <lastname> <firstname>

one

SIBLS <lastname> <firstname>

This command will cause a list of the known siblings of the indicated person to be written to the output file. All maternal siblings will be listed first, and then all paternal siblings. Each list must be appropriately labelled on the output. Note that in order to make this command work properly, the list needs to be linked in a manner that makes it possible to start anywhere in the list, and follow pointers to arrive at any other element of the list. Of course the ring structure specified above suits this purpose.

CHLDN <lastname> <firstname>

This command will cause a list of the known children of the indicated person to be written to the output file.

STREE <lastname> <firstname> <levels>

FTREE <lastname> <firstname> <levels>

This will cause the program to print a picture of the sub-tree of the FAMILY tree rooted at the indicated node, down to the number of levels indicated. You will use the same algorithm to do this as is used by STREE. You must use ONE PROCEDURE to print both kinds of tree.

STORE

FINIS

This command halts the program. A parting message is printed to the standard output by this command.

DETAILS AND ASSUMPTIONS

Your program is not required to do any error detection or correction. It may assume that there are no mistakes of any kind in the input file and no mistakes of any kind in treeFile.old. In particular, no errors in the form or order of the commands or fields, no attempts to record contradictory familial relationships, no attempts to record relationships that are already recorded in the database, and no attempts to access information that does not exist in the database.

DATA STRUCTURES

You are required to use a binary search tree as the main data structure. You must also use linking with pointers to keep track of the relationships, both familial and alphabetical, among the elements of the database.

The tree must be implemented as an array of records. Each record will represent one node of the tree. A record must contain these 7 pointers: left pointing, right pointing, mother pointing, father pointing, maternal sibling pointing, paternal sibling pointing, and children pointing. The pointers must be implemented as integers, and the address a pointer points to will simply be the array element whose index is the value of the pointer. A value not in the array's index range can be used to denote a nil pointer.

I strongly recommend that you organize the 7 pointers as an array, indexed by an enumerated type, with elements such as (leftL, rightR, mother, father, msibling, psibling, children). This will allow you to, in effect, pass the names of the pointers as parameters to procedures, while still retaining a set of mnemonic names for the pointers. This will make it easy to, for one example, write a single procedure which can be used to print either family trees or search trees. You will also find it easy to create a single procedure to implement both the MOTHR and FATHR commands, and a single procedure to implement both the MSIBL and PSIBL commands. [IMPORTANT NOTE: the names "left" and "right" are used in one of the C++ libraries, so DO NOT name anything in your program "left" or "right", else you may not be able to compile the program.]

You must design the other data structures and the structure of your program. There are several choices to be made. These choices will determine how efficiently your program will run, and how simple or complicated the code will be. If you try, you will find many ways to save time by avoiding duplicating code, and by using the right tool (data structure) for the right job.

It is important that you design your program well. You have been taught some clever and elegant ways of doing some of the things your program needs to do. now you must think of some equally clever and elegant ways to do the rest. If you take this advice, it will make your job much easier.

Like good literature, a program should have power, and economy of expression. It should do a lot with very few lines of code, and go like the dickens. (pun intended.)

You are answerable for the wisdom and efficacy of your choice of design. Use C++ classes to implement data structures. Keep clear boundaries between different data types.

HELPFUL HINTS ON THE FAMILY TREE PROBLEM

When implementing DELET, you must check for every possible kind of pointer that might be pointing to the node to be deleted.

Let's say that the node to be deleted is called X. The list of pointers that might point to X consists of left pointers, right pointers, mother or father pointers, msibling pointers, psibling pointers, child pointers, root pointer, and any additional pointers that you might decide to use.

Any pointer pointing to X must be redirected somehow by DELET.

Left, right, and root are handled by a standard deletion algorithm for a binary search tree.

To fix mother or father pointers which may be pointing to X, follow the child pointer of X. If it points to some node Y, find out whether X is the mother of Y or the father of Y (it has to be one or the other). Then, based on this information, traverse the appropriate sibling ring of Y, and null out the appropriate parent pointer in each node visited.

To fix child, msibling, and psibling pointers we can begin by following the mother pointer of X and observing if there is a mother node whose child pointer points back to X. If so, we either change the child pointer of the mother node so that it points instead to a maternal sibling of X, or we nullify it if X has no known maternal siblings. We then delete X from its maternal sibling ring. Next do these same tasks on the paternal side.
When deleting a node X from the binary search tree, you have to be careful if X has two children. The standard algorithms are of two types:

One type copies the info in Y, the leftmost node of the right subtree of X, into the info area of X, and deletes Y from the tree.

The other type actually disconnects X from the tree, disconnects Y from the tree, and then reconnects Y in the former position of X. Here DISCONNECT and RECONNECT refer only to the changes made to root, left, and right pointers -- not mother, father, and so forth.

It is probably best to use the disconnect-reconnect method, because that way the mother, father, and other pointers that point to Y do not have to be redirected.
The AVAIL list is the list of nodes not in use. You must have procedures to simulate the new (memory allocation) and delete (deallocation) procedures by deleting and inserting from and into the AVAIL list.
You have to be careful to take into consideration how the actions of the LOADT command may affect the AVAIL list. If a LOADT command is executed after the AVAIL list is initialized then, unless treeFile.old is empty, the LOADT command will copy tree node data into array slots that are on the AVAIL list. There are two basic ways to handle this situation:
- The program checks to see if there is a LOADT command in the input and if so, executes the LOADT command before it initializes the AVAIL list.
- The program initializes the AVAIL list before executing any of the commands. If there is a LOADT command, the program re-initializes the AVAIL list after copying all the tree node records into the array.
After a LOADT command is executed the AVAIL list must contain all the array slots that are not occupied by the tree nodes that the LOADT command placed into the array. The LOADT command could do the following:
- First mark all the array slots "unused".
- Next copy each tree node into its designated location in the array, taking care to mark each such location "used",
- Finally, starting with an empty AVAIL list, traverse the array and insert each unused array slot into the AVAIL list.

Note: The Kennedy family tree is good data for testing. There is a nice Kennedy Family Tree at a this PBS site

WHAT TO TURN IN:

You will turn in three "phases" of this assignment:

a level 3 version, and
a level 4 version, and
a final version.

For each phase of the assignment, you will turn in a printer output (hardcopy) and you will send me an e-mail message. Please follow these rules:

Always send me e-mail as plain text in the main message body. Never send me attachments.
Always use the exact subject line I specify for each message. (I often get hundreds of e-mail messages in a week. The subject line allows me to find, filter and sort messages.) You will lose a significant number of points on the assignment if you use the wrong subject line.
Be very careful when you send the e-mail. You may use the instructions in your Hello World! lab excercise for guidance. Of course, you will need to make the obvious changes to those directions -- you have to use the correct subject line and filename.
Always send yourself a copy of each e-mail message you send to me, and check immediately to see if you receive the message intact. You are responsible for sending e-mail correctly.

Here is the list of things you have to turn in:

At the start of class on the first due date, place the following item in front of me:
- a hardcopy of your level 3 (or greater) program. (All the source code, i.e. all the *.h and *.cpp files) Make sure all the code is properly formatted and that it all shows on the paper.
Using the subject line: CS3100,prog4.3 send the following item to me by e-mail before midnight on the first due date:

One shell archive file (only one) containing items 1-4.
1. All source files for your level 3 program (everything I will need to compile and run it: all *.cpp files and *.h files)
2. Your test script showing adequate testing of your level 3 program.
3. A file named 'README' containing the compilation command one should use to compile your program.
4. A copy of your 'makefile' if you used one.
At the start of class on the second due date, place the following item in front of me:
- a hardcopy of your level 4 (or greater) program. (All the source code, i.e. all the *.h and *.cpp files) Make sure all the code is properly formatted and that it all shows on the paper.
Using the subject line: CS3100,prog4.4 send the following item to me by e-mail before midnight on the second due date:

One shell archive file (only one) containing items 1-4.
1. All source files for your level 4 program (everything I will need to compile and run it: all *.cpp files and *.h files)
2. Your test script showing adequate testing of your level 4 program.
3. A file named 'README' containing the compilation command one should use to compile your program.
4. A copy of your 'makefile' if you used one.
At the start of class on the third due date, place the following item in front of me:
- a hardcopy of your final level program. (all the source code) Make sure all the code is properly formatted and that it all shows on the paper.
Using the subject line: CS3100,prog4.f send the following item to me by e-mail before midnight on the third due date:

One shell archive file (only one) containing items 1-4.
1. All source files for your final level program (everything I will need to compile and run it: all *.cpp files and *.h files)
2. Your test script showing adequate testing of your final level program.
3. A file named 'README' containing the compilation command one should use to compile your program.
4. A copy of your 'makefile' if you used one.

Note that there are no spaces in the subject lines given. It is important that you do not insert any spaces. My e-mail address is: john@ishi.csustan.edu.

DUE DATES:

For due dates, see the class schedule.

Family Tree Program Binary Trees, Multi-linking, Command Processing, I/O

Family Tree Program
Binary Trees, Multi-linking, Command Processing, I/O