CS 4250: Database Management Systems

Fall 2021 - Homework 1

Due 20 September 2021

This is an individual assignment. All work must be your own. You should not look at any other student's work (in whole or in part, on paper or on screen), nor allow anyone else to look at yours, during the course of this assignment.

Upload to the CS Homework system on or before midnight on 20 September 2021. Typed, wherever possible. Or photographs of hand-drawn answers (dark writing, white paper, even lighting). Save in a widely used file format, like JPG, PNG, etc.

When drawing ER diagrams, be obvious. Put a gap between double-lines. Draw subset symbols nice and large.

  1. Consider a database containing information about supercomputers. Create entities and relationships as appropriate for each (independent) situation described below.

    Information that is true for all of the following ER diagrams: Each supercomputer has a unique identifying number, a name, a number of cores, and a peak TFlop/second rate. (TFlop = teraflop. 1,000,000,000,000 floating point operations per second. Yes, that's 1012. A "core" is a CPU... sort of. Close enough for this homework.)

    For each of the following situations, draw an ER diagram that describes it. Consider each situation independently. (Multiple situations means multiple, separate ER diagrams in a correct homework solution.)

    Information in parentheses is simply information. Information in parentheses does not affect the ER diagrams, but may help you understand the real world context of the question.

    1. Each supercomputer runs on one operating system. An operating system is required by each supercomputer, and the database stores the name of the operating system and version number (together, those can identify the operating system). The database also stores the name of the organization that maintains the operating system and the URL of the operating system documentation.
    2. The database stores information about supercomputers. (See above for more details.) Each supercomputer is located at one physical site. A supercomputer site can be identified by its address (street, city, state, and country). Each supercomputer site also has a name, and a funding agency. (For example, the Sierra supercomputer is at Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA, USA. LLNL is funded by the DOE, Department of Energy.) Note that physical sites can have more than one supercomputer. (LLNL also hosts the Lassen supercomputer.)
    3. The database stores information about performance test results for each supercomputer. A supercomputer must have test results to be in this database. (The performance tests are how peak supercomputer speed is officially measured.) Test results for a particular computer can be identified using the name of the test software and the date the test was run. However, there is no identifier that can be used universally to identify a test result overall, since many supercomputers run the standard tests and might run them on the same day.

      For each performance test, the database stores the name of the test software, the date of the test, the peak speed achieved, and the power consumed (kilowatts) by the supercomputer during the test run.

      (Note: I may have "fudged" the truth a bit, on this question, to make a good homework question. Draw the ER diagram for the situation described.)

    4. The database stores information about supercomputer rankings on the Top500 List. Each Top500 list can be uniquely identified using the month and year it was released, and for each list we also store a description of the highlights. For each pair of supercomputer and Top500 list, the database stores that computer's ranking in that list. (For example, the Sierra supercomputer was #2 on the June 2019 Top500 list, #3 on the June 2020 list, and #3 on the June 2021 list.) Each list contains dozens of supercomputers.

    5. Some of the physical locations of supercomputers are universities, some are national laboratories (research centers) and some are corporations. No location can be more than one of those types of location at once. A supercomputer site can be identified by its address (street, city, state, and country). Each supercomputer site also has a name. The database also stores university student populations (number of students) and logos, and stores national laboratory funding agency names and annual budgets, and stores corporation URLs and annual net incomes.

  2. Translate the following image into relations. Write the SQL statements to create those relations. Write English explanations of anything in the ER diagram that is not captured in your SQL. Assume all keys are data type 'integer' and all other fields are data type 'date'.

  3. (Let me know if the diagram is too small and you want a bigger version of it.)

  4. Suppose a database has the following three relations.
    Museum (mid: integer, name: string, address: string, website: string)
    Displays (museumID: integer, artID: integer, startDate: date)
    ArtItem (aid: integer, title: string, weightPnd: integer, type: string)
    ExhibitsIn (artID: integer, exhibitID: integer, insuranceCost: integer, deliveryDate: date, homeMuseum: integer)
    Exhibition (eID: integer, title: string, startdate: date, enddate: date, specialFee: integer)

    (The "bit" data type stores 0 or 1. False or true. Booleans.)
    "museumID" in Displays is a foreign key referencing "mid" in Museum.
    "exhibitID" in ExhibitsIn is a foreign key referencing "eid" in Exhibition.
    "artID" in Displays and ExhibitsIn are foreign keys referencing "aid" in ArtItem. The art item weights are measured in pounds.

    You may assume simple boolean comparisons (<, >, ≠) work on dates.

    If you are unfamiliar with art museums, they will frequently organize special exhibitions. Museums will borrow art from other museums for a few months, art that fits an exhibition theme, and charge a small extra fee for admission to the exhibition. For examples, here are some past Asian Art Museum of San Francisco exhibitions. Art items must, of course, be insured appropriately before they leave their home museum and are shipped to the exhibition host museum, in case of loss or damage in transit.

    Write the following queries in relational algebra.

    1. Find the exhibition titles and start dates for exhibitions that start after December 2021.
    2. Find the art item titles and weights for art items displayed in a museum starting before January 1980.
    3. Find the art item identifiers and titles, and exhibition identifiers. The art items are of type "painting" and must cost more than $5,000 to insure for their participation in the exhibition.
    4. Find the names and web sites of museums that display an art item titled "Rosetta Stone".
    5. Find the titles and start dates of exhibitions that exhibit an art item titled "The Great Wave off Kanagawa" or an art item titled "Mask of Tutankhamun".

    6. Find the titles and start dates of exhibitions that exhibit an art item titled "Mona Lisa" and also an art item titled "Mask of Tutankhamun".

      (Such an exhibition would never occur. The insurance costs for shipping two such valuable works of art would be astronomical.)