CS 4250: Database Management Systems

Fall 2023 - Homework 4

Concurrency Control and Recovery Homework

Due Wednesday, 13 December 2023, at midnight

Email with subject line "cs4250,hw4" (by midnight), or turn in a typed hardcopy during class time on the due date.

(Strongly prefer plain text. Will accept PDF or MSWord documents, if typed.)

Handwritten answers, on paper or as photographs, strongly discouraged.

This is an individual assignment. All work must be entirely your own. You should not look at any other student's work (in whole or in part, on paper or on screen), nor allow anyone else to look at yours, nor discuss the assignment with other students, during the course of this assignment. Nor may you submit the work of software programs as your own.

Note: T<number> identifies a transaction numbered number. R(<letter>) identifies a read operation on database object letter. W(<letter>) identifies a write operation on database object letter.

Questions

  1. Consider the following schedule:

    T3: R(C), T3: R(D), T1: R(D), T2: R(C), T1: W(D), T2: W(C), T3: R(A), T1: R(A), T2: R(B), T3: W(A), T1: W(A), T2: R(D), T2: W(D)

    Is the schedule serializable? If so, show an equivalent serial transaction order. If not, precisely describe why not.

    If relevant, fill in this table with the equivalent serial transaction order. Time proceeds from left to right, with only one action possible in each time slot.
    Schedule Time 1 Time 2 Time 3 Time 4 Time 5 Time 6 Time 7 Time 8 Time 9 Time 10 Time 11 Time 12 Time 13 Time 14 Time 15
    T1
    T2
    T3

  2. Consider the following schedule:

    T2: R(A), T3: R(B), T1: R(A), T3: W(B), T2: R(B), T1: R(C), T2: W(B), T1: W(C), T2: R(C), T3: R(D), T3: W(D), T2: R(D)

    Is the schedule serializable? If so, show an equivalent serial transaction order. If not, precisely describe why not.

    If relevant, fill in this table with the equivalent serial transaction order. Time proceeds from left to right, with only one action possible in each time slot.
    Schedule Time 1 Time 2 Time 3 Time 4 Time 5 Time 6 Time 7 Time 8 Time 9 Time 10 Time 11 Time 12 Time 13 Time 14 Time 15
    T1
    T2
    T3

  3. Consider the following schedule:

    T3: R(A), T3: W(A), T2: R(A), T1: R(D), T1: R(A), T2: R(A), T3: R(D), T2: R(D), T3: W(B), T2: W(D), T1: W(A), T3: R(C), T1: R(B), T1: W(B)

    Is the schedule serializable? If so, show an equivalent serial transaction order. If not, precisely describe why not.

    If relevant, fill in this table with the equivalent serial transaction order. Time proceeds from left to right, with only one action possible in each time slot.
    Schedule Time 1 Time 2 Time 3 Time 4 Time 5 Time 6 Time 7 Time 8 Time 9 Time 10 Time 11 Time 12 Time 13 Time 14 Time 15
    T1
    T2
    T3

  4. For questions 4-6, consider the execution of the ARIES recovery algorithm given the following log:

    LSN Log Record
    00 begin_checkpoint
    05 end_checkpoint
    10 Update: T2 writes P2
    20 Update: T1 writes P4
    30 Update: T2 writes P1
    40 T2 abort
    50 Update: T4 writes P1
    60 T4 commit
    70 Update: T5 writes P2
    80 CLR: Undo T2 LSN 30
    90 Update: T1 writes P4
    100 T4 end
    110 T5 abort
    120 T1 commit
    X - crash, restart

    For the questions below, when you are asked which log records are read, you are to supply the exact list of LSNs from log above. When data pages are asked for, you are to supply the exact list of page identifiers from the log above. And so on. Be specific and concrete in your answers, answering specifically for the provided log.

    Operations can be identified using the LSN for the log record recording that operation. (So, of course, can the log record itself.)

    Generic statements or quotations from the textbook (or slides) will earn 0 points. Example: "all the pages" == 0 points. "P1, P2 and P3" == more than 0 points, assuming those are the pages read. (Don't use English. List the log record numbers, list the page numbers, etc.)

    For questions 3-5, parts a-c are separate questions. Answer a through c separately. Label your answers so that it is clear what is your answer to '3.a' and what is your answer to '4.c', and so on.

  5. During Analysis:

  6. During Redo:

  7. During Undo:
  8. Consider these tables (from Homework 2).
    MemberOf (teamID: integer, engineerID: integer, role: string, hoursAWeek: integer, startedOn: date, stoppedOn: date)
    Team (teamID: integer, name: string, creationDate: date, projectID: integer)
    SpaceProject(projectID: integer, name: string, destination: string, launchDate: date, costEstimated: integer, description: string, sponsorOrganization: string)

    Assume the MemberOf table has 5,000 tuples in it, the Team table has 1,000 tuples in it, and the SpaceProject table has 1,000 tuples.
    Also consider this question:

    Find the engineer identifiers and roles, and team names, for engineers that are on some team, where the team works on a project sponsored by ISRO. The resulting names should be alphabetized by team names. (ISRO is the Indian Space Research Organization, sponsors of the recent Chandrayaan-3 mission to the Moon.)

    One possible solution to the question is:

    SELECT M.engineerID, M.role, T.name
    FROM MemberOf M, Team T, SpaceProject SP
    WHERE SP.sponsorOrganization = "ISRO" AND T.projectID = SP.projectID AND M.teamID = T.teamID
    ORDER BY T.name

    a) Translate this SQL query into a corresponding relational algebra expression (and write down the expression).
    b) Then draw the query tree for your relational algebra expression.
    c) Using equivalencies of relational algebra expressions, re-write your relational algebra expression into a more efficient form, if possible. Draw the new, corresponding query tree. Explain why your new expression is more efficient.