CS 4440 Chapter 10 Notes


Chapter 10

Probabalistic Algorithms

10.1 Introduction

"When an algorithm is confronted by a choice, it is sometimes
preferable to choose a course of action at random, rather than
to spend time working out which alternative is best.  Such a
situation arises when the time required to determine the optimal
choice is prohibitive, compared to the time saved on the
average by making this optimal choice."

* A probalistic algorithm is non-deterministic.  What it does
  next is not necessarily determined by the current values of
  the data, inputs, and the current instruction.  The program
  may decide "by chance" which instruction to execute next.

* non-determinism can be a good thing.  If a program messes up
  once in a certain situation, it will not necessarily mess up
  again if it gets into the same situation again.  Perhaps all
  we have to do is run the program many times on the same input
  to get sufficiently high confidence that we have a correct
  output.

10.2 Probabalistic Does Not Imply Uncertain

* Which is better?  A totally deterministic and correct
  algorithm that runs a long time -- so long that with
  probability p there will be a hardware error that will cause
  the program to give the wrong output?  A probabilistic
  algorithm that gives the right answer only with probability q
  (which includes the probability of a hardware error and the
  probability that the algorithm makes a mistake)?  Answer:
  Which is bigger: p or q?

* What if no known deterministic or probabilistic algorithm can
  give a precise answer to a problem?  Suppose there is a
  probabalistic algorithm that can solve the problem with an
  error probability "as small as you like?"  Determining if a
  1000-digit number is prime.

* Numerical Algorithms yield an answer and a confidence
  interval:  "With probability 90% the correct answer is 59
  plus or minus 3."

* Monte Carlo algorithms give a specific answer that has a high
  probability of being correct, but which could be way off (No
  confidence interval provided.)

* Some probabalisitc algorithms only use the random choices to
  prioritize the order in which tasks are performed.  Therefore
  they always get the right answer.  The random choices are
  made because they tend to improve the time it will take for
  the algorithm to get the answer.

* Las Vegas algorithms are probabalistic algorithms that never
  get a wrong answer.  However, they can just fail, in which
  case they give no answer at all.  (This is not hard to
  implement if there is an easy way to check answers for
  correctness once they have been obtained.)
  
  When did Columbus discover America?

  Look on page 330 to see how three kinds of algorithm respond.

10.3 Expected versus average time

* Intriguing mention here of a Las Vegas algorithm for finding
  the median of an array.  Very good expected performance.

  + implications on quicksort
  + hashing
  
10.4 Pseudorandom generation

* Examples of functions used to generate pseudo-random
  sequences of numbers.

10.5 Numerical probabalistic algorithms

* "The answer obtained ... is always approximate, but its
  expected precision improves as the time available to the
  algorithm increases."

10.5.1 Buffon's Needle (l'Aiguille de Buffon)

* "... if you throw a needle at random ... on a floor made of
  planks of constant width, if the needle is exactly half as
  long as the planks in the floor are wide, and if the width of
  the cracks between the planks is zero, the probability that
  the needle will fall across a crack is 1/π."  In theory this
  fact could be used to calculate the value of π to any
  desired accuracy.  This particular idea for finding a value
  indirectly through a series of "experiments" is not of any
  known practical value, but it is a simple example of a
  technique with potential to be exploited usefully.

10.5.2 Numerical Integration

* Estimate the area under a curve f(x) on [a,b] by picking a
  point t at random in the interval [a,b] and estimating the
  area as f(t)(b-a).  For better precision, do this many times
  and average the results.

10.5.3 Probabalistic Counting

* Using 8 bits to count (2255)-1 things instead
  of 256 things!  The idea is to generate a random number
  each time we are asked to increment the counter.  If the
  random number is less than 1/2c, then increment
  the counter, else do not increment the counter.  When
  asked for the count, return 2c-1.  This is a
  good estimate of the actual number of ticks requested.
  The expected error of this particular method is too high
  but very good results can be obtained by different bases
  of the logarithm and settling for a lower max value of the
  counter.  For example: "Using base = 1/30 ... allows
  counting up to more than 125,000 in an 8-bit register with
  less than 24% relative error 95% of the time."

10.6 Monte Carlo Algorithms

* A Monte Carlo algorithm is said to be p-correct if it returns
  a correct answer with probability at least p.

* Verfying matrix multiplication
 + not considered a practical application, but good example for
   introducting concepts.
 + The basic idea is that you want to know if matrix equation
    AB = C is true where the matrices are nXn.
 + You can do a test involving O(n^2) work.  It involves
   flipping a coin n times to generate a random binary-valued
   vector.
 + If the test turns returns "false", you know for certain
   that AB != C, but the test can only return "false" or
   "I'm not sure."  However, if you perform the test enough
   times without getting anything but "I'm not sure" then
   the probability is high that AB = C.  You can do this all
   in O(n2log(1/ε)) time, where ε
   is the small error probability you want.  This is
   asymptotically better than any known method involving
   multiplying out the matrx AB.

* Primality Testing
  + It is essential to the RSA cryptographic system that it be
    easy to find large primes and difficult to factor large
    composite numbers.
  + This is how things are in the current state of our (public)
    knowledge.
  + If we can find large primes easily it means we can implement
    RSA encryption easily. (We need primes to make the codes.)
  + If we can factor large composites easily, we can crack the
    RSA encryption easily.
  + Use Fermat's little theorem with a randomly generated
    number to generate a possible demonsration that x is not
    prime.  Try this out enough times and if you always fail,
    then x is probably prime.
  + A (considerably) more sophisticated version of this
    algorithm (the Miller-Rabin test) can "decide the primality
    of n with error probability bounded by ε ... in
    O(log^3(n)*lg(1/e)).  This is entirely reasonable in
    practice for thousand-digit numbers and error probability
    less than 10^(-100)."

10.6.3  Can a number be probably prime?

* It's true of primality testing the probablistic algorithms
  are more reliable than the deterministic algorithms.  This is
  true because the deterministic algorithms run so long that it
  is likely that a hardware error or bug in the program will
  cause an error in the output -- more likely than that the
  probabalistic algorithm will err.

10.6.4  Amplification of stochastic advantage.

* If a test is unbiased -- if it returns false or true not with
  complete certainty but each answer with some uncertainty, can
  we amplify the stochastic advantage to get more certainty?

* The answer is "yes" if we have a Monte Carlo algorithm that
  is p-correct with p > 1/2.  However, in these situations, the
  number of required repetitions of the test may be very large.

10.7 Las Vegas Algorithms

* "Las Vegas algorithms make probabalistic choices to help
  guide them more quickly to a correct solution."

* "Unlike Monte Carlo algorithms, they never return a wrong
  answer"  -- only a correct answer, or "no answer."

* Some of these algorithms always return a correct answer.

* Some can make a "bad turn" and get into a state where no
  answer can be given.  Typically we can run the algorithm
  again if this happens, and have a high probability of success
  after a small number of runs.

10.7.1  The Eight Queens Problem

* Example of a Las Vegas algorithm that "sometimes fails"

* The basic idea is just to randomly place the queens in each
  successive column of the board, taking care only not to place
  a queen in a location where it is challenged by a queen that
  has already been placed. "The algorithm either ends
  successfully if it manages to place all the queens on the
  board or fails if there is no square in which the next queen
  can be added."

* The expected number of nodes explored by the method above is
  "less than half the number of nodes explored by a systematic
  backtracking."

* A hybrid method involving both random choice and backtracking
  can be developed.  It's expected time is still less.

10.7.2 Probabalistic Selection And Sorting

* Use a verstion of the selection algorithm where the pivot is
  randomly chosen from the sub-array that is to be paritioned
  next.

* This version of selection has linear expected run time.

10.7.3 Universal Hashing

* "The basic idea of Las Vegas hashing is for the compiler to
choose the hash function randomly at the beginning of each
compilation and again whenever re-hashing becomes necessary."

10.7.4 Factorizing Large Integers