CS 4250 Class Project

CS 4250: Introduction to Database Management Systems

Project Overview

General Description

The goal of the class project is to implement a database system application. The project includes the following activities spread over the entire semester:

Identify an application area,
Model the data stored in the database,
Design, normalise, and perfect the relational database schema,
Write the SQL commands to create the database, find appropriate data, and populate the database, and
Finally, write the software needed to embed the database system in the application.

The end result should be a functioning application that runs on the WWW and that uses your database to allow useful functionality.

A group of two to four students (ideally, three) should do each project. You are free to choose your own project members; if you would like the instructor to assign you to a group, send an email. Each of the steps above will be a specific project assignment or two. You will get detailed instructions with each assignment. Each group should turn in a single solution to each assignment. Every member of the group will get the same grade.

N.B. There will be a peer evaluation component of project grades. For every component of the project, each group will be required to specify what each of the members of the group contributed to the group work.

Project Ideas

These ideas are just samples. You are free to propose your own ideas. Realize that the ideas below are not complete descriptions. You need to work on them more and develop your project more concretely and in more detail. Do not get intimidated by the examples that are linked from this web page -- they are meant to give you a feel for the application domain. It is up to you to narrowly define the scope of the application within the time frame of a semester-long project. Do not forget that you are supposed to have fun!

Books Database: This domain is a popular one. Just look at barnesandnoble.com or amazon.com for excellent examples. You could model entities such as books, their authors, topics (which may be a complex hierarchy). You may also model various attributes of the authors and the institutions they belong to. You can support a service for buying and selling used books or books used in specific university courses. Your system can build a personal profile of people (and the books they like) and your database application could form the basis for a "recommender system", such as those supported by the commercial sites. The goal here is to "cluster" similar preferences together and the system can then make recommendations: "Since you liked Shakespeare's Romeo and Juliet, I recommend that you try Shakespeare's Antony and Cleopatra".

Movies (or Television Shows) Database: There are several excellent movie resources on the web, such as the hollywood.com movies site or the Internet Movie Database. You could model entities such as movies, their actors, directors, genres, playing times, and reviews. There are several sources on the web from which you could get data to populate such a database. You can support various queries such as finding specific playing times, or finding movies playing in Turlock directed by a given director. You can also support updates to the reviews section of the database (e.g., viewers giving their own opinions). Another functionality is to provide personal profiles of people (i.e., the movies they like) and then try to recommend movies to them based on profiles of viewers with similar tastes. You could also create a database of Oscar or Golden Globe nominations and awards and answer queries such as "Find all the sitcoms that have been nominated three years in a row".

Research Literature: This domain involves modeling research publications. You need to identify the title of the publication, the forum it was published in, the authors, topics, keywords and related subtopic areas. This is a big business now (under the name of digital libraries). For example, the ACM Digital Library provides a beautiful searchable index (and retrievable repository, but that is beyond our scope) of nearly all of the publications of ACM. If you use this domain, then there are a lot of available resources for you to use. The ACM Computing Classification System provides a convenient hierarchial meta-index that you can use to organize your class hierarchy etc. If you are interested in a smaller domain, then the DBLP Bibliography Site provides a searchable facility for publications related to the database and programming communities. At the end of the day, you could identify papers written by a particular person at a particular place or ones in a narrowly defined area.

Research Literature, the Dark Side, aka Retraction Watch: This domain involves modeling retracted scientific publications -- the mistakes, misunderstandings, and occasional outright fraud that can occur in any competitive publication field. The Retraction Watch web site gathers information on cases where a scientific publication was declared "unpublished" and documents why the publishers retracted the paper. The Retraction Watch web site once received a generous grant to help them create a database of scientific paper retractions. You could try to design that database.
Or you could find another web site that gathers up socially useful information and try designing a database for their worthy cause. Possibilities include a database of adoptable stray animals, of endangered animal sightings in the wild, of homeless shelter locations, of food banks for humans, of locations of weeds in the wild that are food sources for endangered creatures (like Monarch butterflies), of museums that store old computers, ...

Municipal Data: A number of different cities and counties, like Santa Clara County, Los Angeles County, and Sonoma County, are putting municipal data on-line for anyone to access. Data on business, education, health, the environment, transportation... all the information governments use to help their areas thrive. You could model an open data web site, and include in your model the types of data available, areas of the county / city the data applies to, the people and projects that generate the data sets, existing applications available to analyze some of the data sets, and so on.

Nobel Prizes Database: The goal might be to model and populate information about the awards made in the various fields (Physics, Chemistry, Physiology or Medicine Literature, Peace and the Economic Sciences), the recipients, their countries, their year of birth etc. Your system should be able to answer questions such as "When was the first time an Asian won an award for the economic sciences?" (the answer to this particular question is 1998). You could also work on variants of this idea such as the recipients of the ACM awards. Interesting queries then could be "Name people who have won at least two different ACM awards" (the answer would include Knuth, Thompson, Ritchie, Engelbart etc.) Or the people "who were ACM Fellows before becoming Turing Award Winners" and so on.

Apartment Homes: This domain would require modeling apartments and their attributes, areas of town and their various characteristics (e.g., Modesto Area Express bus lines, crime rates, distance from various landmarks). You would provide an interface for offering apartments for rent, finding apartments based on various requirements ("pets allowed + rent less than $800 + close to campus").

Others: Of course, there are a whole host of other ideas such as personal photo collections, bank accounts, student records, World Cup data, election results, Senate demographics, car rentals, auto insurance, consumer products, silly statistics, "match-making services" and so on. Use your imagination.

Credit and thanks are due to Dr. Murali at Virginia Tech for inspiration and documentation.

Last modified: Aug 2021