open addressing

Open Addressing

Problems with Linear Probing with Open Addressing

Primary Clustering: occurs when the probe sequences from different indices merge.
Secondary Clustering: occurs when multiple (or all) keys that hash to the same index follow the same probe sequence. This causes the length of the maximum probe chain at an index to be at least as large as the number of previous collisions there. (Probe sequences that originate at the same index merge.)
Note: Some authors define primary and secondary clustering differently.
Snowball Effect: A growing cluster presents a larger "target," and tends to grow larger at an accelerating rate.

Measure of Hashing Performance

The expected numbers of probes for successful search and unsucessful search are appropriate measures of the efficiency of a hashing scheme.

Changing the step size used for linear probing cannot ameliorate secondary clustering, as long as H₁(key1) = H₁(key2) ==> H₂(key1) = H₂(key2)

Solutions

Quadratic Rehashing k = (home address +/- j²) mod tableSize

e.g. H_m(key) = +/- m²

Note (m+1)² = m² + (2m + 1)

Primary clustering is virtually eliminated. Probe sequences starting at different addresses may intersect, but they won't converge after that, they'll 'rebound'.

One may make the tablesize = 4k+3 and prime to get zero repetition from quadratic re-hash (until the table is exhausted).

Double Hash Techniques

Reduces Primary and Secondary Clustering to "acceptable" levels.

The rehash is [H₁(key) + s*H₂(key)] mod tableSize

H₂(key) is chosen to be "random" for keys having the same H₁ value.

Example:

H₁(key) = key mod tableSize ;

H₂(key) = key mod (tableSize - 2) + 1 (adding 1 prevents the possibility that the probe step might be zero!)

What is required is a "randomized" probe step that is a function of the key value, and is relatively prime to the tableSize. Also, the tableSize itself should have no factors that are often found in keys.

i.e. when H₁(key) + m*H_p(key) is used, H_p(key) must be relatively prime to the tableSize in order to guarantee exhaustion of table by the probe sequence.

You can arrange it so that both the tableSize and tableSize-2 are prime. A prime tableSize assures that every probe sequence exhausts the table. When tableSize-2 is prime it can't have any small factors in common with the keys. This lowers the probability that two different keys will be assigned the same probe sequence - the same value of H_p(key).