Open Addressing
Problems with Linear Probing with Open Addressing
- Primary Clustering: all keys that hash to the same
index follow same probe sequence. This causes the length of
the maximum probe chain at an index to be at least as large
as the number of previous collisions there. (Probe
sequences that originate at the same index merge.)
- Secondary Clustering: the merging of two or more
probe sequences from different indexes
- Snowball Effect: A growing cluster presents a larger
"target"
Measure of Hashing Performance
The expected numbers of probes for successful search and
unsucessful search are appropriate measures of the efficiency of
a hashing scheme.
Changing the step size used for linear probing cannot ameliorate
primary or secondary clustering, as long as H1(key1) =
H1(key2) ==> H2(key1) = H2(key2)
Solutions
Quadratic Rehashing k = (home address +/- j2)
mod tableSize
e.g. Hm(key) = +/- m2
Note (m+1)2 = m2 + (2m + 1)
Secondary clustering is eliminated. When two paths cross, they
"recoil" from one another due to different values of m.
One may make the tablesize = 4k+3 and prime to get zero
repetition from quadratic re-hash.
Double Hash Techniques
Reduces Primary and Secondary Clustering to "acceptable" levels.
The rehash is [H1(key) + s*H2(key)] mod
tableSize
H2(key) is chosen to be "random" for keys having the
same H1 value.
Example:
H1(key) = key mod tableSize ;
H2(key) = key mod (tableSize - 2) + 1
(adding 1 prevents the possibility that the probe step might be
zero!)
What is required is a "randomized" probe step that is a function
of the key value, and is relatively prime to the tableSize.
Also, the tableSize itself should have no factors that are often
found in keys.
i.e. when H1(key) + m*Hp(key) is used,
Hp(key) must be relatively prime to the tableSize in
order to guarantee exhaustion of table by the probe sequence.
You can arrange it so that both the tableSize and tableSize-2 are prime. A
prime tableSize assures that every probe sequence exhausts the table. When
tableSize-2 is prime it can't have any small factors in common with the keys.
This lowers the probability that two different keys will be assigned the same
probe sequence - the same value of Hp(key).