I will probably make small changes to 
this information from time to time.
Keep checking the class web space to see the latest version.
 
 
(rev. May 15, 2020)
 
 
Review Topics
 General Guidance 
-  Know each of the problems discussed in your assigned reading.  Be able
     to state each problem clearly, and in sufficient detail to convey real
     understanding of the problem. 
 -  Know the steps of the major algorithms of each assigned section - well enough to 
     execute on paper a small example of the problem, in sufficient detail 
     to demonstrate your full understanding of the algorithm. 
 -  Know the worst-case big-O and big-Θ of those algorithms, 
     and of the major parts of those algorithms. (All big-O and big-Θ
     information given below is "worst case", unless explicitly stated otherwise.) 
     
 -  Know the definitions of the terms that are used 
     to describe the algorithms.
 -  Know the main ideas used to prove important properties of the algorithms.
 
 Chapter One 
-  
      See class web space notes on the Stable Matching Problem
 -  
      See class web space notes on the "Five Problems"
 -  Definitions:
     
     -  The stable matching problem
     
 -  stability
     
 -  an instability
     
 -  matching
     
 -  perfect matching
     
 -  stable matching
     
 -  valid partner
     
 
 -  Algorithms: the Gale-Shapley algorithm
 -  Worst case big-Θ of the Gale-Shapley algorithm: Θ(n2), 
     where n = the number of men = the number of women
 -  Some of the important ideas about the Gale-Shapley algorithm:
      
     -  It always produces a stable matching.
     
 -  It matches each man with his best valid partner.
     
 -  It matches each woman with her worst valid partner.
     
 
 
 Chapter Four: Greedy Algorithms 
-  
      See class web space notes on Greedy Algorithms
 -  Algorithms:
     
     -  Interval Scheduling
          
          -  based on ordering n intervals by increasing finish time
          
 -  requires sorting [ Θ(nlog(n)) ] 
               followed by Θ(n) additional work
          
 
      -  Scheduling All Intervals
          
          -  based on ordering n intervals by increasing start time
          
 -  requires sorting [ Θ(nlog(n)) ] 
          
 -  We think of the problem as assigning intervals of time to classrooms.
          
 -  If we maintain classrooms in a priority queue implemented as a heap,
               and if the key for each classroom is the finish time of the interval
               most recently assigned to the classroom, then the remainder 
               of the work of the algorithm is also [ Θ(nlog(n)) ].
          
 
      -  Scheduling to Minimize Lateness
          
          -  Instead of a start time and a finish time, 
               each interval is given as a duration and a deadline.
          
 -  The goal is to minimize the maximum lateness 
               among all jobs.
          
 -  The strategy of the algorithm is earliest deadline first.
          
 -  The work of the algorithm is [ Θ(nlog(n)) ] overall.
          
 -  The sorting by deadline requires Θ(nlog(n)) work, 
               and the rest of the work is Θ(n) 
               for building the schedule.
          
 
      -  Dijkstra's Algorithm
          
          -  We are given a directed graph G(V,E) with n nodes and m edges, 
               a non-negative weight on each edge, and a source node s.  G is 
               connected in the sense that if k is any node of G, there 
               is at least one path in G from s to k.
          
 -  The goal is to compute the lengths of the shortest paths 
               from s to all the other nodes, and also to determine the shortest
               paths themselves.
          
 -  The greedy rule of thumb is to "choose the shortest new s-v path 
               that can be made from a path in S followed by a single edge."
          
 -  One version of the algorithm, which uses only tables as
               data structures, is Θ(n2).
          
 -  Another version of the algorithm keeps V-S in a priority queue that
               provides Θ(log(n)) "change key" 
               and Θ(log(n)) "extract min" operations.
          
 -  The second algorithm is [ Θ(mlog(n)) ].
          
 -  If m is close in value to n2, then the Θ(n2) 
               algorithm would be more efficient than the Θ(mlog(n)) algorithm,
               for large enough values of n.  
          
 -  If the graph is sparse rather than dense - in other words if m is close
               in value to n, then things would be the other way around: 
               the Θ(mlog(n)) algorithm would be more efficient than 
               the Θ(n2) algorithm, for large enough values of n.
          
 
      -  Kruskal's Algorithm
          
          -  The problem is to find a minimal cost spanning tree in a connected,
               undirected graph with n nodes and m edges.
          
 -  Kruskal's algorithm works by examining the edges in order of increasing
               cost, and adding each edge to a set T if it does not create a cycle, 
               until T contains n-1 edges.
          
 -  Assuming that an efficient version of the Union-Find (merge-find) 
               algorithm is used to check for cycles, Kruskal's algorithm 
               is Θ(mlog(n)), and the work is dominated by the sorting 
               of the edges by cost.
          
 -  The part of the work after the sort is dominated by doing find 
               and merge operations.  This work can be done in Θ(mα(2m,n)) time, 
               where α(2m,n) is a function that grows so slowly that for
               all intents and purposes, it can be considered to be a small
               constant.  In particular, this means the find-merge work can be done 
               in better than Θ(mlog(n)) time.
          
 
      -  Prim's Algorithm
          
          -  The problem is to find a minimal cost spanning tree in a connected,
               undirected graph with n nodes and m edges.
          
 -  Prim's algorithm starts with an empty edge set T, and a node set S
               containing an arbitrary starter node s.  The algorithm repeatedly 
               chooses the least costly edge e connecting a node in S with a node 
               v not in S, adds e to T, and adds v to S.
          
 -  There are two versions of Prim's algorithm that are similar
               to the two versions of Dijkstra's algorithm discussed above.
               (BUT ALWAYS REMEMBER: Dijkstra's algorithm does NOT calculate 
               the same thing as Prim's Algorithm!!)
           -  As is the case with Dijkstra's algorithm, 
               
               -  there is a simple Θ(n2) version of Prim's 
                    algorithm that will perform
                    better for 'large enough' dense graphs, and
               
 -  an Θ(mlog(n)) version of Prim's algorithm that uses 
                    a heap, and that will perform 
                    better for 'large enough' sparse graphs.
               
 
 
           
      
 -  Ideas to Know:
      
      -  the concept of the depth of a set of intervals, and how that 
           relates to the problem of scheduling all intervals.
      
 -  The Union-Find Data Structure & How it is used with Kruskal's Algorithm
           Be prepared to show that you understand how to read the set and height arrays
           to figure out what sets are represented, and to find the label of a set.  
           Also, know how to modify those data structures to perform a merge operation.
      
 
 Chapter Five: Divide and Conquer 
-  
      See class web space notes on Divide and Conquer
 -  Algorithms:
     
     -  Merge Sort of a list of n elements - Θ(nlog(n))
     
 -  Counting Inversions in a list of n elements - Θ(nlog(n))
     
 -  Finding the Closest Pair among a set of n points in the plane 
          (We covered this "lightly" in the spring 2020 course.)
          
          -  The idea of the algorithm is to recursively find the closest pair
               of points in the "left half" of the set and the "right half", and
               then to finish up by doing a linear scan of points near the middle 
               of the set.
          
 -  The algorithm is Θ(nlog(n)).
          
 
      -  Integer Multiplication (Karatsuba-Ofman)
          
          -  The idea of the algorithm is to recursively compute the product of two n-bit 
               numbers by reducing the task to the multiplication of three pairs of 
               (n/2)-bit numbers (possibly 1+(n/2) bits for some of the numbers).
          
 -  The algorithm is Θ(nlog2(3)).  The constant log2(3) is about 1.585.
          
 
      
 -  Ideas to Know:
      
     -  Approaches to Solving Recurrence Relations - including how to apply 
          "the work function theorem." (See information about the work function theorem
          
          in this list.)
     
 -  Know the recurrence relations for Merge Sort, Counting Inversions, Closest-Pair, 
          and Karatsuba Multiplication.
     
 
 
 Chapter Six: Dynamic Programming 
-  
      See class web space notes on Dynamic Programming
 -  Algorithms:
     
     -  Weighted Interval Scheduling
          
          -  We are given n intervals, each with a value.
          
 -  The problem is to choose a mutually disjoint subset with
                highest possible total value.
          
 -  opt(j) is defined as the max total value achievable using
               intervals 1 through j.
          
 -  the relation of problems to subproblems: 
               opt(j) = max { vj+opt(p(j)), opt(j-1) }
          
 -  the algorithm given is based on sorting the intervals by finish time
               (Θ(nlog(n)) work), computing the p(j) values (Θ(nlog(n)) work), 
               and finishing by filling in a table of Opt(j) values (Θ(n) work).
               (Therefore the algorithm does Θ(nlog(n)) work, worst case, overall.)
          
 
      -  Segmented Least Squares (Not covered in the spring, 2020 course)
          
          -  We are given n values of x, 
               x1 < x2 < ... < xn,
               and points in the plane:
               {(x1,y1), (x2,y2), 
                  ..., (xn,yn)}
           -  The problem is to fit the points to a set of straight line 
               segments so as to keep the sum of a penalty function and the
               total square error as low as possible.  
          
 -  The penalty consists of a constant C added for each line segment
               used.
          
 -  opt(j) is defined as the value of the optimum solution for the first
               j points.
          
 -  the relation of problems to subproblems: 
               opt(j) = min1≤i<j{ ei,j+C+opt(i) }
          
 -  the algorithm given is based on calculating all the ei,j, 
               with O(n2) work, and then filling in a table of the 
               n values of opt(j), by doing O(n2) additional work.
          
 
      -  Subset Sum and Knapsack Problem
          
          -  We are given a knapsack with a total weight capacity W, and n objects,
               each with a weight.  W and the other weights are positive integers.
          
 -  In the Knapsack problem, each object has a positive value, 
               and different objects can have different values.
          
 -  With the Subset Sum problem, each object has the same value,
               which can be taken to be 1. 
          
 -  The goal is to select a subset of the objects with as high a total 
               value as possible, without their combined weight being allowed to 
               exceed W. 
          
 -  For 1≤i≤n and 1≤w≤W, opt(i,w) is defined as the greatest total
               value of a subset of objects 1 through i, subject to their total 
               weight not exceeding w.
          
 -  the general relation of problems to subproblems: 
               opt(i,w) = max{ opt(i-1,w), vi+opt(i-1,w-wi) }
               
(w≤wi and i≤1 are special (base) cases)
           -  the algorithm given is based on filling in a table with n+1 rows
               and W+1 columns by performing O(nW) work.  
               This is NOT considered linear time, because there's
               no a priori bound on W.  (In fact both of these problems are 
               NP hard.)
          
 
      -  Sequence Alignment
          
          -  We are given strings 
               X = x1x2...xm and 
               Y = y1y2...yn together 
               with gap penalty δ and mismatch costs αp,q.
          
 -  The goal is to align the two strings so as to minimize the sum of the
               gap penalties and mismatch costs (total cost).
          
 -  For 1≤i≤m and 1≤j≤n, opt(i,j) is defined as the 
               minimum cost of an alignment between 
               x1x2...xi and
               y1y2...yj.
          
 -  the general relation of problems to subproblems: 
               opt(i,j) = 
               min{ αxiyj + opt(i-1,j-1),
                     δ + opt(i-1,j), δ + opt(i,j-1)  }
           -  the algorithm given is based on filling in a table with m+1 rows
               and n+1 columns by performing O(mn) work.
          
 
      -  Sequence Alignment in Linear Space
          
          -  An algorithm based on divide and conquer is presented that
               solves the sequence alignment problem (including the calculation of
               the mapping from X to Y) using O(m+n) space and O(mn) time.
          
 
      -  Bellman-Ford Algorithm for Shortest Paths (not covered in the spring 2020 course)
          
          -  We are given a directed graph G(V,E) with n nodes and m edges, 
               a weight (possibly negative) on each edge, a source node s, 
               a sink node t, with no edges leading out, a path 
               from every node to t, and no negative cycles.
          
 -  The goal is to compute the cheapest path from s to t, and
               its cost.
          
 -  For 1≤i≤n-1 and v∈V, opt(i,v) is defined as
               the minimum cost of a v-t path using at most i edges.
          
 -  the general relation of problems to subproblems: 
               opt(i,v) = 
               min { opt(i-1,v), minw∈V { opt(i-1,w) + cvw } }
           -  the algorithm given is based on filling in a table with a row
               for each v∈V and n columns, by performing O(mn) 
               work in O(n2) space.  
          
 -  The amount of space utilized can be reduced to O(n), still
               allowing calculation of the cheapest path from s to t, as
               well as its cost.
          
 -  In fact the algorithm also calculates the cheapest paths and
               costs from every node to t.
          
 
      
 -  Ideas to Know:
      
     -  Principles of Dynamic Programming: Memoization 
          or Iteration over Subproblems
     
 
 
 Chapter Seven 
-  
      See class web space notes on Network Flow
 -  Know all definitions & concepts associated with flow networks
 -  Know how to perform the Ford-Fulkerson algorithm (FFA)
 -  Know how to use the FFA to calculate max flows and min cuts
 -  Know how to construct residual graphs
 -  Know how to calculate capacities of cuts and the value of a network
     flow based on any cut.
 -  Know the relationship between flows and cuts.
 -  Know the version of the FFA that is O(mC), where m is 
     the number of edges (m ≥ n/2), and C is the sum of the capacities 
     of all the edges leaving the source node. Know that no more than 
     C iterations of the main loop of this version of FFA are required.
     Understand also that if K is the upper bound of all the capacities 
     of the edges leaving the source node, then K ≤ C < nK, and so this same
     version of the FFA is O(mnK).
 -  Know that the construction of a residual graph Gf corresponding to 
     a network flow graph G is O(m+n) work, where m is the number of edges in G and 
     n is the number of nodes.  Know that O(m+n) is the same as O(m) when (m ≥ n/2),
     which is the default assumption about network flow graphs.
 -  Know that after running the FFA on a network flow graph G, the set 
     of nodes, A*, reachable from the source in the residual graph Gf 
     is the source-side of a minimum cut in G, and an O(m+n) breadth-first 
     or depth-first search is all that is needed to determine which nodes
     are elements of A*.  Know that O(m+n) is the same as O(m) when (m ≥ n/2),
     which is the default assumption about network flow graphs. 
 -  Know the second version of the FFA that "chooses good augmenting paths" 
     by using a scaling factor Δ.
     
Know this algorithm is O(m2(1+logK)) 
     and also O(m2(1+logC)), where, m, K, and C ≥ K are as
     defined above. 
 -  Know how to explain the Δ parameter of the second version of the
     FFA and how it us used to achieve good augmenting paths.
 -  Understand that the facts we proved about the FFA assume that the flow
     networks under consideration have positive integer capacities on all
     edges.
 -  Know how to solve bipartite matching problems using the FFA.
 -  Know that the following problems can be solved using the FFA:
     Survey Design, Airline Scheduling, Image Segmentation, and Baseball 
     Elimination.
 
 Chapter Eight 
-  Know the definitions of P and NP.
 -  Know what a decision problem is.
 -  Know what an instance of a decision problem is.
 -  Know about measures of size of an instance of a problem.
 -  Know the terminology regarding decision problems.
 -  Know what X≤PY and X≡PY mean for decision
     problems X and Y.
 -  Know about the transitivity of the ≤P relationship - i.e.
     X≤PY and Y≤PZ imply X≤PZ.
 -  Know and understand that if X≤PY and Y∈P, 
     then X∈P
 -  Know and understand that if X≤PY and X∉P, 
     then Y∉P
 -  Know how the reductions we discussed were done - know the 'constructions'
 -  Given a description of a problem, be able to say whether it is a problem 
     our text showed to be in P or NP. 
 -  Know techniques for demonstrating that a problem is in P or is in NP.
 -  Know facts about the question of whether P = NP.  What is the
     significance of the question?  Do we know whether P ⊆ NP?
     Do we know whether NP ⊆ P?
 -  Know what NP stands for (Non-deterministically Polynomial).
 -  Know what an NP-complete problem is.  A problem Z is NP-complete if Z is in NP
     and if every NP-problem reduces to Z.  More formally, Z is NP-complete if
     Z ∈ NP and if for every X ∈ NP, X≤PZ.  (Less formally, 
     think of NP-complete problems as the NP problems that are as 'hard' as possible.)
 -  Know that if someone can prove that an NP-complete problem is in P, then
     it would also prove that P=NP.  (Suppose Z is NP-complete and in P.  If 
     X ∈ NP, X≤PZ.  Since Z ∈ P and X reduces to Z, that means
     X ∈ P too.  This shows that an arbitrary X ∈ NP is an X ∈ P.  We already 
     know P⊆NP, so this proves that P=NP.)
 -  Be able to identify some NP-complete problems, like the decision versions of 
     Independent Set, Vertex Cover, and Set Cover.  Also, Circuit Satisfiability, 3-SAT,
     and Hamiltonian Circuit.