I will probably make some additions and changes to this information in the next few days. Keep checking the class web space to see the latest version.

(rev. Sun Dec 4 18:57 PST 2016)

Review Topics

General Guidance

Know the steps of the major algorithms of each assigned section.
Know the big-O of those algorithms.
Know the definitions of the terms that are used to describe the algorithms.
Know the main ideas used to prove important properties of the algorithms.

Chapter One

See class web space notes on the Stable Matching Problem
See class web space notes on the "Five Problems"
Definitions:
- The stable matching problem
- stability
- an instability
- matching
- perfect matching
- stable matching
- valid partner
Algorithms: the Gale-Shapley algorithm
Big-O of the Gale-Shapley algorithm: O(n²), where n = the number of men = the number of women
Some of the important ideas about the Gale-Shapley algorithm:
- It always produces a stable matching.
- It matches each man with his best valid partner.
- It matches each woman with her worst valid partner.

Chapter Four: Greedy Algorithms

See class web space notes on Greedy Algorithms
Algorithms:
- Interval Scheduling
  - based on ordering n intervals by increasing finish time
  - requires sorting [ O(nlog(n)) ] followed by O(n) additional work
- Scheduling All Intervals
  - based on ordering n intervals by increasing start time
  - requires sorting [ O(nlog(n)) ]
  - We think of the problem as assigning intervals of time to classrooms.
  - If we maintain classrooms in a priority queue implemented as a heap and ordered by 'earliest last finish time' then the remaining work of the algorithm is also [ O(nlog(n)) ].
- Scheduling to Minimize Lateness
  - Instead of a start and finish time, each interval is given as a duration and a deadline.
  - The goal is to minimize the maximum lateness among all jobs.
  - The strategy of the algorithm is earliest deadline first.
  - The work of the algorithm is [ O(nlog(n)) ]
- Dijkstra's Algorithm
  - We are given a directed graph G(V,E) with n nodes and m edges, a non-negative weight on each edge, a source node s, and a path from s to every other node.
  - The goal is to compute the shortest paths from s to all the other nodes.
  - The greedy rule of thumb is to "choose the shortest new s-v path that can be made from a path in S followed by a single edge."
  - One version of the algorithm, which uses only tables as data structures, is O(n²).
  - Another version of the algorithm keeps V-S in a priority queue that provides O(log(n)) "change key" and O(log(n)) "extract min" operations.
  - The second algorithm is [ O(mlog(n)) ].
  - If m is close in value to n², then the O(n²) algorithm would be more efficient than the O(mlog(n)) algorithm, for large enough values of n.
  - If the graph is sparse rather than dense - in other words if m is close in value to n, then things would be the other way around: the O(mlog(n)) algorithm would be more efficient than the O(n²) algorithm, for large enough values of n.
- Kruskal's Algorithm
  - The problem is to find a minimal cost spanning tree in a connected, undirected graph with n nodes and m edges.
  - Kruskal's algorithm works by examining the edges in order of increasing cost, and adding each edge to a set S if it does not create a cycle, until S contains n-1 edges.
  - Assuming that an efficient version of the Union-Find algorithm is used to check for cycles, Kruskal's algorithm is O(mlog(n)), and the work is dominated by the sorting of the edges by cost.
  - The part of the work after the sort can be O(mα(2m,n)), where α(2m,n) is a function that grows so slowly that for all intents and purposes, it can be considered to be a small constant.
- Prim's Algorithm
  - The problem is to find a minimal cost spanning tree in a connected, undirected graph with n nodes and m edges.
  - Prim's algorithm starts with an empty edge set T, and a node set S containing an arbitrary starter node s. The algorithm repeatedly chooses the least costly edge e connecting a node in S with a node v not in S, adds e to T, and adds v to S.
  - There are two versions of Prim's algorithm that are similar to the two versions of Dijkstra's algorithm discussed above.
    (BUT ALWAYS REMEMBER: they do NOT calculate the same thing!!)
  - As is the case with Dijkstra's algorithm,
    - there is a simple O(n²) version of Prim's algorithm that will perform better for 'large enough' dense graphs, and
    - an O(mlog(n)) version of Prim's algorithm that uses a heap, and that will perform better for 'large enough' sparse graphs.
Ideas to Know:
- the concept of the depth of a set of intervals, and how that relates to the problem of scheduling all intervals.
- The Union-Find Data Structure & How it is used with Kruskal's Algorithm

Chapter Five: Divide and Conquer

See class web space notes on Divide and Conquer
Algorithms:
- Merge Sort of a list of n elements - O(nlog(n))
- Counting Inversions in a list of n elements - O(nlog(n))
- Finding the Closest Pair among a set of n points in the plane
  - The idea of the algorithm is to recursively find the closest pair of points in the "left half" of the set and the "right half", and then to finish up by doing a linear scan of points near the middle of the set.
  - The algorithm is O(nlog(n)).
Ideas to Know:
- Approaches to Solving Recurrence Relations

Chapter Six: Dynamic Programming

See class web space notes on Dynamic Programming
Algorithms:
- Weighted Interval Scheduling
  - We are given n intervals, each with a value.
  - The problem is to choose a mutually disjoint subset with highest possible total value.
  - opt(j) is defined as the max total value achievable using intervals 1 through j.
  - the relation of problems to subproblems: opt(j) = max { v_j+opt(p(j)), opt(j-1) }
  - the algorithm given is based on sorting the intervals by finish time, computing the p(j) values, and finishing up by doing O(n) work to fill in a table.
- Segmented Least Squares
  - We are given n values of x,
    x₁ < x₂ < ... < x_n,
    and points in the plane:
    {(x₁,y₁), (x₂,y₂), ..., (x_n,y_n)}
  - The problem is to fit the points to a set of straight line segments so as to keep the sum of a penalty function and the total square error as low as possible.
  - The penalty consists of a constant C added for each line segment used.
  - opt(j) is defined as the value of the optimum solution for the first j points.
  - the relation of problems to subproblems: opt(j) = min_1≤i≤j{ e_i,j+C+opt(i-1) }
  - the algorithm given is based on calculating all the e_i,j, with O(n²) work, and then filling in a table of the n values of opt(j), by doing O(n²) additional work.
- Subset Sum and Knapsack Problem
  - We are given a knapsack with a total weight capacity W, and n objects, each with a weight. W and the other weights are positive integers.
  - In the Knapsack problem, each object has a positive value, and different objects can have different values.
  - With the Subset Sum problem, each object has the same value, which can be taken to be 1.
  - The goal is to select a subset of the objects with as high a total value as possible, without their combined weight being allowed to exceed W.
  - For 1≤i≤n and 1≤w≤W, opt(i,w) is defined as the greatest total value of a subset of objects 1 through i, subject to their total weight not exceeding w.
  - the general relation of problems to subproblems: opt(i,w) = max{ opt(i-1,w), v_i+opt(i-1,w-w_i) }
    (w≤w_i and i≤1 are special (base) cases)
  - the algorithm given is based on filling in a table with n+1 rows and W+1 columns by performing O(nW) work. This is NOT considered linear time, because there's no a priori bound on W. (In fact both of these problems are NP complete.)
- Sequence Alignment
  - We are given strings X = x₁x₂...x_m and Y = y₁y₂...y_n together with gap penalty δ and mismatch costs α_p,q.
  - The goal is to align the two strings so as to minimize the sum of the gap penalties and mismatch costs (total cost).
  - For 1≤i≤m and 1≤j≤n, opt(i,j) is defined as the minimum cost of an alignment between x₁x₂...x_i and y₁y₂...y_j.
  - the general relation of problems to subproblems:
    opt(i,j) = min{ α_{x_iy_j} + opt(i-1,j-1), δ + opt(i-1,j), δ + opt(i,j-1) }
  - the algorithm given is based on filling in a table with m+1 rows and n+1 columns by performing O(mn) work.
- Sequence Alignment in Linear Space
  - An algorithm based on divide and conquer is presented that solves the sequence alignment problem (including the calculation of the mapping from X to Y) using O(m+n) space and O(mn) time.
- Bellman-Ford Algorithm for Shortest Paths
  - We are given a directed graph G(V,E) with n nodes and m edges, a weight (possibly negative) on each edge, a source node s, a sink node t, with no edges leading out, a path from every node to t, and no negative cycles.
  - The goal is to compute the cheapest path from s to t, and its cost.
  - For 1≤i≤n-1 and v∈V, opt(i,v) is defined as the minimum cost of a v-t path using at most i edges.
  - the general relation of problems to subproblems:
    opt(i,v) = min { opt(i-1,v), min_w∈V { opt(i-1,w) + c_vw } }
  - the algorithm given is based on filling in a table with a row for each v∈V and n columns, by performing O(mn) work in O(n²) space.
  - The amount of space utilized can be reduced to O(n), still allowing calculation of the cheapest path from s to t, as well as its cost.
  - In fact the algorithm also calculates the cheapest paths and costs from every node to t.
Ideas to Know:
- Principles of Dynamic Programming: Memoization or Iteration over Subproblems

Chapter Seven

See class web space notes on Network Flow
Know all definitions & concepts associated with flow networks
Know how to perform the Ford-Fulkerson algorithm (FFA)
Know how to use the FFA to calculate max flows and min cuts
Know how to construct residual graphs
Know how to calculate capacities of cuts and the value of a network flow based on any cut.
Know the version of the FFA that runs in O(mC) steps, what that means, how to explain it, and how to show it is true. Know what m and C represent in the formula O(mC)
Know the relationship between flows and cuts.
Know the second version of the FFA that "chooses good augmenting paths," and know the bound on the number of steps the algorithm requires - the bound that was proved in the text: O(m²logC).
Know what m and C represent in the bound O(m²logC).
Know how to explain the Δ parameter of the second version of the FFA and how it us used to achieve good augmenting paths.
Understand that the facts we proved about the FFA assume that the flow networks under consideration have positive integer capacities on all edges.
Know how to solve bipartite matching problems using the FFA.
Know that the following problems can be solved using the FFA: Survey Design, Airline Scheduling, Image Segmentation, and Baseball Elimination.

Chapter Eight

Know the definitions of P and NP.
Know what a decision problem is.
Know what an instance of a decision problem is.
Know about measures of size of an instance of a problem.
Know the terminology regarding decision problems.
Know what X≤_PY and X≡_PY mean for decision problems X and Y.
Know about the transitivity of the ≤_P relationship - i.e. X≤_PY and Y≤_PZ imply X≤_PZ.
Know and understand that if X≤_PY and Y∈P, then X∈P
Know and understand that if X≤_PY and X∉P, then Y∉P
Know how the reductions we discussed were done - know the 'constructions'
Given a description of a problem, be able to say whether it is a problem our text showed to be in P or NP.
Know techniques for demonstrating that a problem is in P or is in NP.
Know facts about the question of whether P = NP. What is the significance of the question? Do we know whether P ⊆ NP? Do we know whether NP ⊆ P?
Know what NP stands for (Non-deterministically Polynomial).