Open Addressing

Problems with Linear Probing with Open Addressing

Measure of Hashing Performance

The expected numbers of probes for successful search and unsucessful search are appropriate measures of the efficiency of a hashing scheme.

Changing the step size used for linear probing cannot ameliorate secondary clustering, as long as H1(key1) = H1(key2) ==> H2(key1) = H2(key2)

Solutions

Quadratic Rehashing k = (home address +/- j2) mod tableSize

e.g. Hm(key) = +/- m2

Note (m+1)2 = m2 + (2m + 1)

Primary clustering is virtually eliminated. Probe sequences starting at different addresses may intersect, but they won't converge after that, they'll 'rebound'.

One may make the tablesize = 4k+3 and prime to get zero repetition from quadratic re-hash (until the table is exhausted).

Double Hash Techniques

Reduces Primary and Secondary Clustering to "acceptable" levels.

The rehash is [H1(key) + s*H2(key)] mod tableSize

H2(key) is chosen to be "random" for keys having the same H1 value.

Example:

H1(key) = key mod tableSize ;

H2(key) = key mod (tableSize - 2) + 1 (adding 1 prevents the possibility that the probe step might be zero!)

What is required is a "randomized" probe step that is a function of the key value, and is relatively prime to the tableSize. Also, the tableSize itself should have no factors that are often found in keys.

i.e. when H1(key) + m*Hp(key) is used, Hp(key) must be relatively prime to the tableSize in order to guarantee exhaustion of table by the probe sequence.

You can arrange it so that both the tableSize and tableSize-2 are prime. A prime tableSize assures that every probe sequence exhausts the table. When tableSize-2 is prime it can't have any small factors in common with the keys. This lowers the probability that two different keys will be assigned the same probe sequence - the same value of Hp(key).