(Last Revision: 11/08/98)

//////////////////////////////////////////////////
CS 3100 PROGRAMMING ASSIGNMENT #04
-- Family Tree Program
Binary Trees, Multi-linking, Command Processing, I/O
//////////////////////////////////////////////////

This assignment counts as TWO -- #4 and #5.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DUE: Second Level Program with all the input echoing working:
     Wednesday, November 18.

DUE: Fourth Level Program
     Friday, Dec 4.

DUE: Final version
     Friday, Dec 11.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

THE ASSIGNMENT:

            --THE FAMILY TREE PROGRAM--

The residents of a small city decide that they want to have a
database that keeps track of information about their family
trees.  The plan is to collect, during each business day,
requests for changes to the database and requests for reports
on the contents of the database.  The requests are to be
processed all together at the end of the day they are received.

The database must be implemented as a binary search tree, keyed
on last name first, and first name second (To simplify matters,
we will work under the assumption that two different persons
never have both the same first name and the same last name.)
The family tree data will be kept by setting *additional*
pointers from node to node in the binary search tree,
indicating various familial relationships.

You must implement the tree as an array of record-nodes, using
integer fields as pointers ("simulated pointers").  This will
facilitate the moving of copies of the database to and from
secondary storage.  What is especially important is that this
will make it possible for the program to quickly load a copy of
the database from a file as the program is starting up. 

You have been assigned the task of creating the program that
implements the database.  Your program must have the ability to
load a copy of the database from the old treefile, make the
requested changes coming from the standard input, generate
requested reports by writing them to standard output, and store
the updated database by making a new treefile.

INPUT:

Format: 

The COMMANDS from standard input, and the information in the
treefiles will each be typed in a specific free format.  Your
program must be carefully written so that the way it reads is
in exact accordance with all the specifications given in this
document.  Your program will be rejected if it fails to be
capable of using input or treefiles that I create, or if a new
treefile it creates turns out to be unusable as an old treefile
on a subsequent program run.

Sample files will be provided on-line.  You must test your
program on the sample files before turning it in.  You must
also test your program fully on your own input files -- files
that you create to test all the average, unusual, and boundary
cases of input that you can think of.

A standard input file will consist simply of a series of
commands, each command separated from the next by one or more
white space characters (blanks, tabs, and/or end-of-line
characters).

Each command will consist of a series of one or more ELEMENTS.
Each element is also separated from the next by one or more
white space characters.  The first element in a command will
always be a KEYWORD.  The rest of the elements, if there are
any, will be PARAMETERS.

Each element will be an uninterrupted sequence of one or more
characters (a string).  Elements are only allowed to contain
upper-case letters, lower-case letters, or digits -- no other
characters are legal.

There may or may not be white space before the beginning of the
first command and after the end of the last command.

Here is an (extreme) example consisting of two separate
commands:

          Insrt
   kAne              John  DELET
      smith MaRy
       

The two commands are Insrt Kane John and Delet Smith Mary.

It is NOT a good idea to create input that has such irregular
spacing as this example.  The point of insisting on this free
format is that people who create input shouldn't be obliged to
follow a rigid file format.  That will cause errors and lost
time.  Users have the discipline to produce neat input, but they
aren't machines, and so they should be given some freedom in
how they use white space and case.  You may have noticed that
the command interpreters that you work with as you type
commands on Unix, Windows, or Macintosh computers give you a
similar freedom to format as you wish.

With our formatting rules, a command does not have to begin or
end at any particular place on a line, and wherever white space
occurs in the file, there is no upper limit to the amount of it
there may be.  

A treefile must be formatted similarly.  The difference is that
it will be a series of NODE RECORDS.  Each node record will
consist of a series of 10 FIELDS: node number, last name, first
name, left pointer, right pointer, mother pointer, father
pointer, maternal sibling pointer, paternal sibling pointer,
and child pointer.  The fields must appear in the treefile in
exactly the order listed above.

Parsing:

Your program is required to read all input as CHARACTER DATA,
no matter where it is being read from.  This means all input
has to be read into variables of type char, array of char, or
string.

Your program must be written to be case insensitive.  In other
words, no distinction between upper-case and lower-case letters
will be made by your program while processing any input from
any source.  For example, a command element or field of a node
record written in upper-case letters will be understood just
the same as if written in lower-case letters, or if written in
some combination of upper-case and lower-case.

Usually the easiest way to make case insensitivity work is to
convert all lower-case letters to upper-case soon after they
are read in by the program.  You can also work a similar scheme
by converting all upper-case letters to lower-case.  You will
see, however, that it will sometimes be necessary to echo
characters exactly as read before mapping them to a different
case.  See below.  

Samples: 

Here are a few sample commands, along with some description of
what the commands do (Assume that these commands are executed
in the order given):

INSRT KANE JOHN

   /* Searches for a node with name John Kane in the data
       base.  If it is not in there, inserts a node with that
       name. */

   inSrt Smith    rebeCca

   /* As above, inserts Rebecca Smith if she is not there.
       (Note the mixed cases and extra blanks.) */

Mothr Smith Rebecca   kane john

   /* Records in the database the information that Rebecca
      Smith, is the mother of John Kane. */

Insrt Powell ruby

   /* As above, inserts Ruby Powell in the database, if not
      there.  */

MSIBL powell ruby smith rebecca

   /* Records in the database the information that Ruby Powell
      and Rebecca Smith are maternal siblings.  This means that
      they have the same mother.  (If they have the same father
      too, that fact can be recorded with the PSIBL command.)
      */ 

Sources:

Your program will read a series of commands from standard
input, and may also do some reading from a file named
treeFile.old.  The file treeFile.old will only be read as part
of the execution of the LOADT command (see below).  You have to
make sure that the program is set up to read from a file with
that name exactly.  The spelling must be the same.  The use of
upper and lower case must be the same.  

Since you will usually want to carefully construct the series
of commands to be input to the program before the program is
run, you should normally run the program with the standard
input redirected to an input file containing your commands.
When you do this, there is no possibility for you to interact
with the program via the keyboard.  That is an acceptable and
even desirable way for this particular program to work.

COMMAND LIST: 

A complete list of the commands appears below, but first we
need to define some notation:

<lastname> means a last name is typed here (30 letters or less).

<firstname>  means  a first name is typed here (30 letters or less)

<levels> means  an integer from 1-6 is typed here (Remember
		that all integers have to be read in as
		character data, and converted by the program to
		the integer data type.)

Now here below is the list of commands.  The comments are NOT
part of the commands.

/* Load the database from treeFile.old. */
LOADT

/* insert this person in the database if not there. */
INSRT <lastname> <firstname>

/* The first person is the mother of the second person. */
MOTHR <lastname> <firstname> <lastname> <firstname>

/* The first person is the father of the second. */
FATHR <lastname> <firstname> <lastname> <firstname> 

/* The first and second persons have the same mother. */
MSIBL <lastname> <firstname> <lastname> <firstname> 

/* The first and second persons have the same father. */
PSIBL <lastname> <firstname> <lastname> <firstname>

/* The first and second persons have both the same mother and
the same father. */
BSIBL <lastname> <firstname> <lastname> <firstname>

/* If s/he is in the database, delete this person. */
DELET <lastname> <firstname>

/* Print a list of the known siblings of this person. */
SIBLS <lastname> <firstname>

/* Print a list of the known children of this person. */
CHLDN <lastname> <firstname>

/* Print the search tree rooted at the node corresponding to
this person. */
STREE  <lastname> <firstname> <levels> 

/* Print the known family tree of this person. */
FTREE <lastname> <firstname> <levels> 

/* Store the database in treeFile.new. */
STORE

/* Stop the program. */
FINIS


PROGRAM ACTIONS & OUTPUT:

The program does whatever is necessary to initialize.  Then it
reads from the standard input and executes the commands it
finds there, one after another, until the command FINIS is
encountered, which causes the program to halt.  You will use
Unix redirection to make the program get its input from
whatever file you wish to be the input file.  Example: If the
executable version of your program is named famTree.out, then
the command

		famTree.out < inFile3 > outFile7

causes the program to read its standard input from the file
named inFile3, and causes it to write its standard output to
the file named outFile7.

The program does all writing to standard output or to a
treefile, which must be named treeFile.new.  Each time a
command is read from the standard input, the exact text of the
command must be written to the output file.  However, you don't
have to echo the white space exactly.  You can just separate
each string with a single blank.  For example,

FTREE    BroWN            BILly

can be echoed as:

FTREE BroWN BILly

It will be very useful to have such echoing done while you are
testing and debugging.  Naturally I will also find it useful
during my testing.

If the command in question results in any changes whatever
being made to the database, then a message explaining what
those changes were must be printed next.  The use of such
messages is another very useful testing and debugging aid.
Messages must include changes made to pointers, as well as
changes like additions or deletions of nodes.  If the command
results in a large number of changes, the message must
summarize what has been done without explicitly listing each
change.  Be sure to design the code that writes the messages at
the same time you are designing the code that carries out the
actions.  That will help you to coordinate both activities
correctly.  

After the outputs described above have been done, any output
that has been explicitly requested by the command will be
written.  In the case of the STORE command, the output goes
into a treefile.  For all other commands, the output goes to
standard output.

The output that results from consecutive command lines from
standard input must be clearly separated on the standard output,
perhaps by dotted lines or series of asterisks.  (NOT just by
blank lines.)

You must clearly label all the major parts of the output on
standard output, and generally organize the display to be highly
readable and attractive.  You are free to decide upon the
details of how the standard output file is displayed, as long
as you do it well.

The output from the STORE command must be written in the format
of a treefile -- see the section above entitled INPUT.  It must
also be written in a manner that makes it highly readable.  

COMPLETE COMMAND DESCRIPTIONS:

Here is a description of how the program obeys each command:

LOADT  

     This command will appear, if at all, only as the FIRST
     COMMAND in the input file.  The effect of the command will
     be that the data in treeFile.old will be used to make a
     copy of the database in RAM, complete with the structure
     embodied in all its pointer fields.  This will be
     accomplished by copying the data in treeFile.old into an
     array of structures.
     
     The node number field in each respective node record
     contained in treeFile.old will be used as an index into
     the array, and the information in the other fields of the
     node record will be copied into the array element
     determined by the index.

     The first node record in treeFile.old will be assumed to
     be that of the tree's root, so that the external root
     pointer for the tree must be set equal to the first field
     in treeFile.old.

     The <last name> and <first name> fields of the node
     records will be strings of 30 letters or less.  The other
     fields will all be integers in the range [0 .. arrayMax-1].
     You may assume that my test files will not require your
     arrayMax value to be higher than 50.

INSRT <lastname> <firstname>

     This command initiates a search through the tree for a
     node containing the name given.  The algorithm used must
     be the standard search algorithm for binary search trees.
     If no such node is found, a new node is created for the
     named person, and it is inserted into the search tree in
     the proper location -- again, using the standard method
     for inserting a node into a binary search tree.

     Since nodes have to be created somehow, you will have to
     devise a scheme for allocating unused array elements for
     new nodes.  A desirable way of doing this is to maintain a
     linked list of unused array elements.  Let's say you call
     it AVAIL.  When a new node is needed, it is taken from
     AVAIL, and when a node is deleted from the tree, it is
     returned to AVAIL.  If you like, you can adapt the
     SimSpace code in your text for this purpose.

MOTHR <lastname> <firstname> <lastname> <firstname>

     This command will make a mother pointer in the second
     person's node point to the first person's node, indicating
     that the first person is the mother of the second person.

     It will also set such a pointer in every member of the
     MATERNAL SIBLING LIST attached to the second person's
     node.  Besides that, if the LIST OF CHILDREN of the first
     person is not empty, it will merge that list with the
     aforementioned maternal sibling list.  (You are not
     required to keep sibling lists or lists of children in any
     particular order.)

     If the children pointer of the first person's node is not
     pointing to any node, then it will be made to point to the
     second person's node.

     Here is a definition of a MATERNAL SIBLING:  Person X is a
     maternal sibling of person Y if and only if X and Y have
     the same mother, and X and Y are not the same person.

     The maternal sibling list attached to the second person's
     node is a circular list of search tree nodes, linking
     together all known maternal siblings of the second
     person.

     The list of children of the first person is also a circular
     linked list of search tree nodes.  The mother has a
     children pointer which points to one element of the list.
     The list elements themselves are linked into a ring with
     their maternal sibling pointers.

     The list of children and the maternal sibling list
     referred to above can be ONE AND THE SAME, or they can be
     separate.  It all depends on how much information has been
     fed into the database about the familial relationships
     among the various nodes.

     For example, It is possible to know that two people have
     the same mother, but at the same time to NOT know the
     identity of the mother.  In this case, a list of maternal
     siblings is not connected to a mother node.

FATHR <lastname> <firstname> <lastname> <firstname>

     This command is just like the command above, except it
     works with father pointers, rather than mother pointers,
     and the list of children emanating from the father node is
     continued via paternal sibling pointers, rather than
     maternal sibling pointers.  Paternal siblings, of course,
     are people who have the same father.

MSIBL <lastname> <firstname> <lastname> <firstname> 

     This command records the fact that the first and second
     persons have the same mother.  It will merge the maternal
     sibling lists of the two people.  If one of the nodes,
     call it X, has a mother pointer that points to a node, and
     the other node, call it Y, has a mother pointer that does
     not point to a node, then all the nodes in the maternal
     sibling list of Y (including Y) will have their mother
     pointers pointed at the mother of X.

PSIBL <lastname> <firstname> <lastname> <firstname>

     This command does what the command above does, except by
     setting paternal pointers instead of maternal ones.

BSIBL <lastname> <firstname> <lastname> <firstname>

     This command has the effect of executing the two commands:

             MSIBL <lastname> <firstname> <lastname> <firstname>
             PSIBL <lastname> <firstname> <lastname> <firstname>

DELET <lastname> <firstname>

     If the indicated person is present in the database, this
     command deletes him or her.  Deletion has to be done in
     such a way that ALL pointers to the indicated node are
     appropriately redirected, or nulled.  The root pointer,
     left pointers, right pointers, mother pointers, father
     pointers, maternal sibling pointers, paternal sibling
     pointers, and child pointers all have to be considered,
     and fixed if any of these kinds of pointers point to the
     node in question.

     To take care of root, left, and right pointers, *one* of
     the standard algorithms for deleting a node from a binary
     tree must be used.  CAREFUL: I'll discuss in class the
     nature of the algorithm you need to use.  If the deletion
     from the binary search tree is done correctly, the rest of
     the work will be relatively simple, We will discuss this in
     class.  (You will also find some discussion of this below.)

SIBLS <lastname> <firstname>

     This command will cause a list of the known siblings of the
     indicated person to be written to the output file.  All
     maternal siblings will be listed first, and then all
     paternal siblings.  Each list must be appropriately
     labelled on the output.  Note that in order to make this
     command work properly, the list needs to be linked in a
     manner that makes it is possible to start anywhere in the
     list, and follow pointers to arrive at any other element
     of the list.  Of course the ring structure specified above
     is perfect for this purpose.

CHLDN <lastname> <firstname>

     This command will cause a list of the known children of the
     indicated person to be written to the output file.

STREE  <lastname> <firstname> <levels> 

     This will cause the program to print a picture of the
     sub-tree of the SEARCH tree rooted at the indicated
     person's node, down to the number of levels indicated.  A
     reverse in-order traversal can be used as the basis of the
     algorithm.  We can discuss the implementation of this
     command further in class.

     The information printed for each node will be the first
     ten letters of the first and last names.  (First name
     first, on a line by itself.  Then last name underneath, on
     a line by itself.)

FTREE <lastname> <firstname> <levels> 

     This will cause the program to print a picture of the
     sub-tree of the FAMILY tree rooted at the indicated node,
     down to the number of levels indicated.  You will use the
     same algorithm to do this as is used by STREE.  You must
     use ONE PROCEDURE to print both kinds of tree.

STORE  

     This command will cause the program to write its current
     version of the database to the file treeFile.new.

     The file treeFile.new must be written in the correct
     format.  See the subsection titled Format above, in the
     main section titled INPUT.  The file must also be laid out
     to maximize its readability to humans.  The first node
     record written into treeFile.new must be that of the
     root.  Naturally, the treeFile.new written by STORE must
     be readable by LOADT, provided of course that we first
     change its name to treeFile.old.

     A good way to implement STORE would be to do a pre-order
     traversal of the tree, and to write the information
     corresponding to each node as the node is visited.

FINIS

     This command halts the program.  A parting message is
     printed to the standard output by this command.

DETAILS AND ASSUMPTIONS

Your program is not required to do any error detection or
correction. It may assume that there are no mistakes of any
kind in the input file and no mistakes of any kind in
treeFile.old.  In particular, no errors in the form or order of
the commands or fields, no attempts to record contradictory
familial relationships, no attempts to record relationships
that are already recorded in the database, and no attempts to
access information that does not exist in the database.

DATA STRUCTURES

You are required to use a binary search tree as the main data
structure.  You must also use linking with pointers to keep
track of the relationships, both familial and alphabetical,
among the elements of the database.  

The tree must be implemented as an array of records.  Each
record will represent one node of the tree.  A record will
contain exactly 7 pointers: left pointing, right pointing,
mother pointing, father pointing, maternal sibling pointing,
paternal sibling pointing, and children pointing.  The pointers
must be implemented as integers, and the address a pointer
points to will simply be the array element whose index is the
value of the pointer.  A value not in the array's index range
can be used to denote a nil pointer.

I strongly recommend that you use an array of pointers, indexed
by an enumerated type, with elements such as (left, right,
mother, father, msibling, psibling, children).  This will allow
you to, in effect, pass the names of the pointers as parameters
to procedures, while still retaining a set of mnemonic names
for the pointers.  This will make it easy to, for one example,
write a single procedure which can be used to print either 
family trees or search trees.  You will also find it easy to
create a single procedure to implement both the MOTHR and FATHR
commands, and a single procedure to implement both the MSIBL
and PSIBL commands.

You must design the other data structures and the structure of
your program.  There really are a lot of choices to be made.
These choices will determine how efficiently your program will
run, and how simple or complicated the code will be.  If you
try, you will find many ways to save time by avoiding
duplicating code, and by using the right tool (data structure)
for the right job.

It is important that you design your program well.  You have
been told some clever and elegant ways of doing some of the
things your program needs to do.  now you must think of some
equally clever and elegant ways to do the rest.  There is no
disadvantage to you to do this!  It will make your job much
easier.

Like good literature, a program should have power, and economy
of expression.  It should do a lot with very few lines of code,
and go like the dickens.  (pun intended.)

You are answerable for the wisdom and efficacy of your choice
of design.  You must implement each data structure in a way
consistent with the definition we have chosen for the term Data
Type -- a set of data values together with operations on those
data values.  In other words, use C++ classes to implement the
data structures.  Keep clear boundaries between different data
types.

HELPFUL HINTS ON THE FAMILY TREE PROBLEM

1.  When implementing DELET, you must check for every possible
    kind of pointer that might be pointing to the node to be
    deleted.

    Let's say that the node to be deleted is called X.  The
    list of pointers that might point to X consists of left
    pointers, right pointers, mother or father pointers,
    msibling pointers, psibling pointers, child pointers,
    root pointer, and any additional pointers that you might
    decide to use.

    Any pointer pointing to X must be redirected somehow by
    DELET.

    Left, right, and root are handled by a standard deletion
    algorithm for a binary search tree.

    To fix mother or father pointers which may be pointing to
    X, follow the child pointer of X.  If it points to some
    node Y, find out whether X is the mother of Y or the father
    of Y (it has to be one or the other).  Then, based on this
    information, traverse the appropriate sibling ring of Y,
    and null out the appropriate parent pointer in each node
    visited.

    To fix child, msibling, and psibling pointers we can begin
    by following the mother pointer of X and observing if there
    is a mother node whose child pointer points back to X.  If
    so, we either change the child pointer of the mother node
    so that it points instead to a maternal sibling of X, or we
    nullify it if X has no known maternal siblings.  We then
    delete X from its maternal sibling ring.  Next do these 
    same tasks on the paternal side.

2.  When deleting a node X from the binary search tree, you
    have to be careful if X has two children.  The standard
    algorithms are of two types:

    One type copies the info in Y, the rightmost node of the
    left subtree of X, into the info area of X, and recursively
    deletes Y from the tree.

    The other type actually disconnects X from the tree,
    disconnects Y from the tree, and then reconnects Y in the
    former position of X.  Here DISCONNECT and RECONNECT refer
    only to the changes made to root, left, and right pointers
    -- not mother, father, and so forth.

    It is probably best to use the disconnect-reconnect method,
    because that way the mother, father, and other pointers
    that point to Y do not have to be redirected.

3.  The AVAIL list is the list of nodes not in use.  You must
    have procedures to simulate the new (malloc) and delete
    (free) procedures by deleting and inserting from and into
    the AVAIL list.

4.  LOADT should adjust the AVAIL list.  Perhaps the easiest
    way is just to rebuild AVAIL after all of treeFile.old has
    been copied in.  Starting with a cleared and empty AVAIL,
    the program can probe each array element, inserting the
    unused ones into AVAIL.