CS 4250 Project Part 5

Hardcopy/email due at midnight (email only) or 11am (paper or email) on Thursday, April 27, 2017

(Subject line of "cs4250, project part 5", exactly. Plain text, PDF, or MS Word only. (Note: .zip is not PDF, plain text, or MSWord.) )

Start this assignment with the nicely designed, normalized relations you created for Project Part 4.

This assignment will consist of implementing your schema in MySQL (or an RDBMS of your choice) and populating the database. You can utilise your MySQL account (see professor for your password) for all aspects of this assignment.

  1. Write an SQL database schema for your application, using the CREATE TABLE commands described in class. If you make any changes to the relational schema from Part 4, please tell me the rationale for your changes.

    Do not start loading these definitions into the database just yet!

  2. Load your table definitions into the database.

    Now, execute some sample INSERT, DELETE and UPDATE commands on every one of your relations. Convince yourself that things are working fine. Required: report any interesting observations that you make. (Simple observations are fine, like "I learned not to UPDATE a tuple I had not yet INSERTed.")

  3. Develop a substantial amount of data for your application and load it into your relations using the bulk loading facility provided in MySQL: you can use either the LOAD INFILE syntax or the mysqlimport program.

    We want data in the order of 40-50 tuples for each relation in your application. To create the data, either get it from some web source or information repository (making sure that it is public -- legal to use for non-profit purposes -- first, of course) or write a program in any scripting/programming language (like Perl, PHP or C) that can create large files of records in a format acceptable to the bulk loader. Either way, realize that you may need to transform data from one form to another to be acceptable for use in MySQL.

    In very rare situations, one of your relations might require only 4--5 tuples. For example, if you are modeling international airplane flights, there are only seven continents on the planet Earth, in which case it is okay to have just a few tuples in that particular relation. In such cases, where you have only a few tuples, state explicitly why you have only a few tuples. (During most semesters, no student group encounters a legitimate reason to have a small table in their database.)

    Precautions to Take if you "Cook up" your Data: In parts (1) and (2), you have have specified key(s) (and possibly foreign keys) for each relation. Make sure that while cooking up data, your fabricated data indeed confirms to such "key restrictions". If you declare that "StudentID" is the key, then do not generate data for different students having the same ID! Notice that relations will definitely share attributes (if there are no common attributes at all in your schema, then your project is in grave danger, see the instructor immediately!), so make sure that they "agree" on the common attributes. For example, if there are two relations that share an attribute called "name", then make sure that the names do indeed tally! Do not list the name "Astaire" as "ASTAIRE" in one and "astAIRE" in another and "Astaire" in yet another because then they would be three different names! The bottom line is to be consistent. When two things are meant to be the same, make sure your data reflects this fact. If you do not ensure so at this stage, you will spend more time debugging your application than otherwise! (GIGO problems.)
What to turn in: A document that details the following:
  1. A list of your relation schemas reproduced from Part 4, with an explanation of any changes you made. This list serves as a reminder for me. This is the list of relations you start work with on Part 5.
  2. A list of your SQL CREATE TABLE commands (with PRIMARY KEY, NOT NULL, DEFAULT, FOREIGN KEY, and CHECK constraints). This is the list of create table statements your group creates as part of your work on Part 5; it is not the same as the list required by #1.
  3. A list of the number of tuples for each relation in your application. (MySQL's "explain" command will list the number of tuples in any of your tables, if you lose track. Or use the query "select count(*) from mytablename;")
  4. Samples of INSERT, DELETE, and UPDATE commands that you tried out on your relations and that worked, plus your observations.
  5. The source of your data. Examples of possible answers are "We got it from this web address (http://where.we.found.data/)", "We got this from this reference book on apartment homes", "We imported it from this document", or "We cooked it up". Whatever your answer, write a short paragraph on how you transformed the data from its native form to the form required by MySQL. If you wrote a special purpose program or script (in Perl, PHP, C or some other language), include that script. Explain how you took special care to make sure that key restrictions and common attribute constraints are not violated. Explain why some relations have only a few tuples (if applicable).
  6. Sample of tuples for each relation (about 6--10 each), just to give me an idea of how they look. (Screenshots, or copy and paste of text output from command line MySQL interface, strongly preferred.)
  7. Do not forget to include all project member names (and email addresses, and member contributions) with what you turn in.
What NOT to turn in:
  1. Pages and pages of raw data or printouts of all the data in your database. There is no way I can read all that, line by line!
Precautions and Recommendations:
  1. This might seem obvious but it is surprising the number of people who miss it. Please SAVE a copy of your SQL statements (and data) on some disk or flash drive or somewhere.

    (Every good database person knows the value of backups!)

Useful links:

  1. Getting Started with MySQL
  2. MySQL's auto_increment -- for setting up a table so it will create unique numerical ID values automatically
  3. test_student.sql -- sample, plain text script file that creates a table and loads data from a plain text file into the new table
  4. USA Gov't data
  5. - Data in comma-separated-values (CSV) format is ready to load into a database. One row per tuple, each field separated from neighbors with commas. If you want to strip out a field, you can write a simple program to count commas and delete the unwanted field. (Or, if the data file is small enough, load the data directly into MS Excel, delete the unwanted column, and export back into a new CSV file.)
  6. Web site that provides random data generation services