CS 4440 Chapter 10 Notes
Chapter 10
Probabalistic Algorithms
10.1 Introduction
"When an algorithm is confronted by a choice, it is sometimes
preferable to choose a course of action at random, rather than
to spend time working out which alternative is best. Such a
situation arises when the time required to determine the optimal
choice is prohibitive, compared to the time saved on the
average by making this optimal choice."
* A probalistic algorithm is non-deterministic. What it does
next is not necessarily determined by the current values of
the data, inputs, and the current instruction. The program
may decide "by chance" which instruction to execute next.
* non-determinism can be a good thing. If a program messes up
once in a certain situation, it will not necessarily mess up
again if it gets into the same situation again. Perhaps all
we have to do is run the program many times on the same input
to get sufficiently high confidence that we have a correct
output.
10.2 Probabalistic Does Not Imply Uncertain
* Which is better? A totally deterministic and correct
algorithm that runs a long time -- so long that with
probability p there will be a hardware error that will cause
the program to give the wrong output? A probabilistic
algorithm that gives the right answer only with probability q
(which includes the probability of a hardware error and the
probability that the algorithm makes a mistake)? Answer:
Which is bigger: p or q?
* What if no known deterministic or probabilistic algorithm can
give a precise answer to a problem? Suppose there is a
probabalistic algorithm that can solve the problem with an
error probability "as small as you like?" Determining if a
1000-digit number is prime.
* Numerical Algorithms yield an answer and a confidence
interval: "With probability 90% the correct answer is 59
plus or minus 3."
* Monte Carlo algorithms give a specific answer that has a high
probability of being correct, but which could be way off (No
confidence interval provided.)
* Some probabalisitc algorithms only use the random choices to
prioritize the order in which tasks are performed. Therefore
they always get the right answer. The random choices are
made because they tend to improve the time it will take for
the algorithm to get the answer.
* Las Vegas algorithms are probabalistic algorithms that never
get a wrong answer. However, they can just fail, in which
case they give no answer at all. (This is not hard to
implement if there is an easy way to check answers for
correctness once they have been obtained.)
When did Columbus discover America?
Look on page 330 to see how three kinds of algorithm respond.
10.3 Expected versus average time
* Intriguing mention here of a Las Vegas algorithm for finding
the median of an array. Very good expected performance.
+ implications on quicksort
+ hashing
10.4 Pseudorandom generation
* Examples of functions used to generate pseudo-random
sequences of numbers.
10.5 Numerical probabalistic algorithms
* "The answer obtained ... is always approximate, but its
expected precision improves as the time available to the
algorithm increases."
10.5.1 Buffon's Needle (l'Aiguille de Buffon)
* "... if you throw a needle at random ... on a floor made of
planks of constant width, if the needle is exactly half as
long as the planks in the floor are wide, and if the width of
the cracks between the planks is zero, the probability that
the needle will fall across a crack is 1/π." In theory this
fact could be used to calculate the value of π to any
desired accuracy. This particular idea for finding a value
indirectly through a series of "experiments" is not of any
known practical value, but it is a simple example of a
technique with potential to be exploited usefully.
10.5.2 Numerical Integration
* Estimate the area under a curve f(x) on [a,b] by picking a
point t at random in the interval [a,b] and estimating the
area as f(t)(b-a). For better precision, do this many times
and average the results.
10.5.3 Probabalistic Counting
* Using 8 bits to count (2255)-1 things instead
of 256 things! The idea is to generate a random number
each time we are asked to increment the counter. If the
random number is less than 1/2c, then increment
the counter, else do not increment the counter. When
asked for the count, return 2c-1. This is a
good estimate of the actual number of ticks requested.
The expected error of this particular method is too high
but very good results can be obtained by different bases
of the logarithm and settling for a lower max value of the
counter. For example: "Using base = 1/30 ... allows
counting up to more than 125,000 in an 8-bit register with
less than 24% relative error 95% of the time."
10.6 Monte Carlo Algorithms
* A Monte Carlo algorithm is said to be p-correct if it returns
a correct answer with probability at least p.
* Verfying matrix multiplication
+ not considered a practical application, but good example for
introducting concepts.
+ The basic idea is that you want to know if matrix equation
AB = C is true where the matrices are nXn.
+ You can do a test involving O(n^2) work. It involves
flipping a coin n times to generate a random binary-valued
vector.
+ If the test turns returns "false", you know for certain
that AB != C, but the test can only return "false" or
"I'm not sure." However, if you perform the test enough
times without getting anything but "I'm not sure" then
the probability is high that AB = C. You can do this all
in O(n2log(1/ε)) time, where ε
is the small error probability you want. This is
asymptotically better than any known method involving
multiplying out the matrx AB.
* Primality Testing
+ It is essential to the RSA cryptographic system that it be
easy to find large primes and difficult to factor large
composite numbers.
+ This is how things are in the current state of our (public)
knowledge.
+ If we can find large primes easily it means we can implement
RSA encryption easily. (We need primes to make the codes.)
+ If we can factor large composites easily, we can crack the
RSA encryption easily.
+ Use Fermat's little theorem with a randomly generated
number to generate a possible demonsration that x is not
prime. Try this out enough times and if you always fail,
then x is probably prime.
+ A (considerably) more sophisticated version of this
algorithm (the Miller-Rabin test) can "decide the primality
of n with error probability bounded by ε ... in
O(log^3(n)*lg(1/e)). This is entirely reasonable in
practice for thousand-digit numbers and error probability
less than 10^(-100)."
10.6.3 Can a number be probably prime?
* It's true of primality testing the probablistic algorithms
are more reliable than the deterministic algorithms. This is
true because the deterministic algorithms run so long that it
is likely that a hardware error or bug in the program will
cause an error in the output -- more likely than that the
probabalistic algorithm will err.
10.6.4 Amplification of stochastic advantage.
* If a test is unbiased -- if it returns false or true not with
complete certainty but each answer with some uncertainty, can
we amplify the stochastic advantage to get more certainty?
* The answer is "yes" if we have a Monte Carlo algorithm that
is p-correct with p > 1/2. However, in these situations, the
number of required repetitions of the test may be very large.
10.7 Las Vegas Algorithms
* "Las Vegas algorithms make probabalistic choices to help
guide them more quickly to a correct solution."
* "Unlike Monte Carlo algorithms, they never return a wrong
answer" -- only a correct answer, or "no answer."
* Some of these algorithms always return a correct answer.
* Some can make a "bad turn" and get into a state where no
answer can be given. Typically we can run the algorithm
again if this happens, and have a high probability of success
after a small number of runs.
10.7.1 The Eight Queens Problem
* Example of a Las Vegas algorithm that "sometimes fails"
* The basic idea is just to randomly place the queens in each
successive column of the board, taking care only not to place
a queen in a location where it is challenged by a queen that
has already been placed. "The algorithm either ends
successfully if it manages to place all the queens on the
board or fails if there is no square in which the next queen
can be added."
* The expected number of nodes explored by the method above is
"less than half the number of nodes explored by a systematic
backtracking."
* A hybrid method involving both random choice and backtracking
can be developed. It's expected time is still less.
10.7.2 Probabalistic Selection And Sorting
* Use a verstion of the selection algorithm where the pivot is
randomly chosen from the sub-array that is to be paritioned
next.
* This version of selection has linear expected run time.
10.7.3 Universal Hashing
* "The basic idea of Las Vegas hashing is for the compiler to
choose the hash function randomly at the beginning of each
compilation and again whenever re-hashing becomes necessary."
10.7.4 Factorizing Large Integers