Open Addressing
Problems with Linear Probing with Open Addressing
- Primary Clustering: occurs when the probe sequences from
different indices merge.
- Secondary Clustering: occurs when multiple (or all) keys that
hash to the same index follow the same probe sequence.
This causes the length of the maximum probe chain at an index
to be at least as large as the number of previous collisions
there. (Probe sequences that originate at the same
index merge.)
- Note: Some authors define primary and secondary clustering differently.
- Snowball Effect: A growing cluster presents a larger
"target," and tends to grow larger at an accelerating rate.
Measure of Hashing Performance
The expected numbers of probes for successful search and
unsucessful search are appropriate measures of the efficiency of
a hashing scheme.
Changing the step size used for linear probing cannot ameliorate
secondary clustering, as long as H1(key1) =
H1(key2) ==> H2(key1) = H2(key2)
Solutions
Quadratic Rehashing k = (home address +/- j2)
mod tableSize
e.g. Hm(key) = +/- m2
Note (m+1)2 = m2 + (2m + 1)
Primary clustering is virtually eliminated. Probe sequences starting
at different addresses may intersect, but they won't converge after that,
they'll 'rebound'.
One may make the tablesize = 4k+3 and prime to get zero
repetition from quadratic re-hash (until the table is exhausted).
Double Hash Techniques
Reduces Primary and Secondary Clustering to "acceptable" levels.
The rehash is [H1(key) + s*H2(key)] mod
tableSize
H2(key) is chosen to be "random" for keys having the
same H1 value.
Example:
H1(key) = key mod tableSize ;
H2(key) = key mod (tableSize - 2) + 1
(adding 1 prevents the possibility that the probe step might be
zero!)
What is required is a "randomized" probe step that is a function
of the key value, and is relatively prime to the tableSize.
Also, the tableSize itself should have no factors that are often
found in keys.
i.e. when H1(key) + m*Hp(key) is used,
Hp(key) must be relatively prime to the tableSize in
order to guarantee exhaustion of table by the probe sequence.
You can arrange it so that both the tableSize and tableSize-2 are prime. A
prime tableSize assures that every probe sequence exhausts the table. When
tableSize-2 is prime it can't have any small factors in common with the keys.
This lowers the probability that two different keys will be assigned the same
probe sequence - the same value of Hp(key).