Collision Resolution Strategies
- Open Address Methods ('search paths' in secondary table positions)
- External (Separate) Chaining (a linked list on each table address)
- Coalesced Chaining (a hybrid method)
Open Address Method: The general approach is to use a
'probe sequence' H1(key) + Hm(key) (mod
tablesize) for m=1,2,3,4, ...
Hm tells you which slot to try next if the previous
slots were all in use.
Linear Rehashing: This is the open address method that
uses Hm(key) = m. The probe sequence is
H1(key) + m. One advantage of this is simplicity.
- Search under linear rehashing: assuming table slots are numbered
0 through tableSize-1, do this: while (the key is not found & the current
location is not empty & we have not probed all the slots yet) set m =
(m+1 % tableSize), and probe at location m.
- Insertion under linear rehashing: first search, then
if not found and table not full, place element in first
location in probe sequence marked "empty" or "deleted."
- Deletion under linear rehashing: first search, then
if found mark location "deleted" (do not mark "empty" -
subsequent searching must not terminate prematurely at a
location where an element has been deleted.)
Proliferation of "deleted" cells can clutter the table and slow
searches. One partial solution is to move up elements after
deletions, or do such compaction from time to time.
External Chaining: Each table slot contains a linked list
which can grow dynamically.
Advantages of External Chaining
- Simple and efficient search, insertion, and deletion
- more dynamic allocation than with open addressing
- It is possible to have more records in the table than array
slots. (load factors α > 1)
Coalesced Chaining: The table contains link fields and a
cellar for overflow.
0
1
2 address region
3
4
----------------------------
5
6 cellar
H maps into the address region only (using division in this
example). The cellar is for keys that need to be rehashed from
the address region.
The rule is to place a colliding key into the empty place with
the largest address (epla).
The example below illustrates what happens when the key sequence
is 27, 29, 32, 34, 37, 47, and 53.
Note that the probe sequence for overflows from cell 3 'coalesces'
in the end with the probe sequence for overflows from cell 2.
table table table
address contents contents
(data) (link)
0 53 nil
1 47 0 <-- coalesced link
2 27 6
3 37 1
4 29 5
--------------------------------------------
5 34 nil
6 32 3
Empirical studies show that a cellar about 15% of the size of the
main table works well. The search effort is not much more than
with external chaining. Coalesced chaining is an overall better
performer. Deletion can be done without resorting to marking
records "deleted" but deletion is more complicated than in the
case of external chaining, because lists can coalesce. There is
a problem finding the predecessor of a search key. (Vitter wrote
extensively about coalesced chaining.)