CS 4250: Introduction to Database Management Systems
General Description The goal of the class project is to
implement a database system application. The project includes the
following activities spread over the entire semester:
The end result should be a functioning application that runs on
the WWW and that uses your database to allow useful functionality.
- Identify an application area,
- Model the data stored in the database,
- Design, normalise, and perfect the relational database schema,
- Write the SQL commands to create the database, find appropriate data,
and populate the database, and
- Finally, write the software needed to embed the
database system in the application.
A group of two to four students (ideally, three) should do each project.
You are free to choose
your own project members; if you would like the instructor to assign you
to a group, send an email. Each of the steps above will be a specific
project assignment or two. You will get detailed instructions with each
assignment. Each group should turn in a single solution to each
assignment. Every member of the group will get the same grade.
N.B. There will be a peer evaluation component of project grades.
For every component of the project, each group will be required to specify what
each of the members of the group contributed to the group work.
Project Ideas These ideas are just samples. You are free to
propose your own ideas. Realize that the ideas below are not complete
descriptions. You need to work on them more and develop your project
more concretely and in more detail. Do not get intimidated by the
examples that are linked from this web page -- they are meant to
give you a feel for the application domain. It is up to you to narrowly
define the scope of the application within the time frame of a
semester-long project. Do not forget that you are supposed to have
- Books Database: This domain is a popular one. Just look
at barnesandnoble.com or amazon.com for excellent examples. You
could model entities such as books, their authors, topics (which may be
a complex hierarchy). You may also model various attributes of the
authors and the institutions they belong to. You can support a service
for buying and selling used books or books used in specific university
courses. Your system can build a personal profile of people (and the
books they like) and your database application could form the basis for
a "recommender system", such as those supported by the commercial sites.
The goal here is to "cluster" similar preferences together and the
system can then make recommendations: "Since you liked Shakespeare's Romeo and
Juliet, I recommend that you try Shakespeare's Antony and Cleopatra".
- Movies (or Television Shows) Database:
There are several excellent movie resources on the web, such as the hollywood.com movies site or the
Internet Movie Database. You could
model entities such as movies, their actors, directors, genres, playing
times, and reviews. There are several sources on the web from which you
could get data to populate such a database. You can support various
queries such as finding specific playing times, or finding movies playing
in Turlock directed by a given director. You can also support updates
to the reviews section of the database (e.g., viewers giving their own
opinions). Another functionality is to provide personal profiles of
people (i.e., the movies they like) and then try to recommend movies to
them based on profiles of viewers with similar tastes. You could also
create a database of Oscar or Golden Globe nominations and awards and
answer queries such as "Find all the sitcoms that have been nominated
three years in a row".
- Research Literature: This domain involves modeling research
publications. You need to identify the title of the publication, the
forum it was published in, the authors, topics, keywords and related
subtopic areas. This is a big business now (under the name of digital
libraries). For example, the ACM Digital
Library provides a beautiful searchable index (and retrievable
repository, but that is beyond our scope) of nearly all of the
publications of ACM. If you use this domain, then there are a lot of
available resources for you to use. The ACM Computing
Classification System provides a convenient hierarchial meta-index
that you can use to organize your class hierarchy etc. If you are
interested in a smaller domain, then the DBLP
Bibliography Site provides a searchable facility for publications
related to the database and programming communities. At the end of the
day, you could identify papers written by a particular person at a
particular place or ones in a narrowly defined area.
- Research Literature, the Dark Side, aka
Retraction Watch: This domain
involves modeling retracted scientific publications -- the mistakes,
misunderstandings, and occasional outright fraud that can occur in any
competitive publication field. The Retraction Watch web site gathers
information on cases where a scientific publication was declared "unpublished"
and documents why the publishers retracted the paper.
The Retraction Watch
web site once received a generous grant to help them create a database
of scientific paper retractions. You could try to design that database.
Or you could find another web site that gathers up socially useful information
and try designing a database for their worthy cause. Possibilities include
a database of adoptable stray animals, of endangered animal sightings in the
wild, of homeless shelter locations, of food banks for humans, of locations of
weeds in the wild that are food sources for endangered creatures
(like Monarch butterflies), of museums that store old computers, ...
- Municipal Data: A number of different cities and counties,
like Santa Clara County,
Los Angeles County, and
Sonoma County, are
putting municipal data on-line for anyone to access. Data on business,
education, health, the environment, transportation... all the information
governments use to help their areas thrive. You could model an
open data web site, and include in your model the types of data available,
areas of the county / city the data applies to, the people and projects
that generate the data sets, existing applications available to analyze
some of the data sets, and so on.
- Nobel Prizes Database: The goal might be to model and populate
information about the awards made in the various fields (Physics,
Chemistry, Physiology or Medicine Literature, Peace and the Economic
Sciences), the recipients, their countries, their year of birth etc.
Your system should be able to answer questions such as "When was the
first time an Asian won an award for the economic sciences?" (the answer
to this particular question is 1998). You could
also work on variants of this idea such as the recipients of the ACM awards. Interesting queries then
could be "Name people who have won at least two different ACM awards" (the
answer would include Knuth, Thompson, Ritchie, Engelbart etc.) Or the
people "who were ACM Fellows before becoming Turing Award Winners" and
- Apartment Homes: This domain would require
modeling apartments and their attributes, areas of town and their
various characteristics (e.g., Modesto Area Express bus lines, crime rates, distance from
various landmarks). You would provide an interface for offering
apartments for rent, finding apartments based on various requirements
("pets allowed + rent less than $800 + close to campus").
- Others: Of course, there are a whole host of other ideas such
as personal photo collections, bank accounts, student records, World Cup data,
election results, Senate demographics, car rentals, auto insurance, consumer
products, silly statistics, "match-making services"
and so on. Use your imagination.
Credit and thanks are due to Dr. Murali at Virginia Tech for inspiration and documentation.
Last modified: Aug 2021