Reading assignment for chapter 12 has changed
a little -- read about hashing first.
Finish Discussion of chapter 11 material --
priority queues.
Array-based implementation of heap/priority
queue
Binary search tree implementation of
heap/priority queue
Balance binary search tree implementation of
heap/priority queue
advantages/disadvantages
Discussion of hashing
What is the problem we want to solve with
hashing?
What is the ADT hash table?
Hashing approximates the idea of having an
array indexed by the set of all possible keys.
When the idea above is not practical, we use
the hash function as a mapping key ---> index.
A good hash function is easy and fast to
calculate and produces an even distribution
of keys across the set of indices.
A perfect hash function is one-to-one:
different keys never hash to the same index in the
table. If all keys are known in advance it
is practical to create a perfect hash
function, otherwise not.
To increase the likelihood of even
distribution of keys:
calculation of hash function should
involve the entire key.
If hash function uses modular
arithmetic, the base should have no
factors less than 25 or so. To be on
the safe side, choose a prime for the
base.
Collisions are likely
Wednesday, November 29
Take Roll.
Check out this week's schedule
Announcement(s)
Level three version of program #5 is due
Friday, Dec 01.
Load factor
alpha = (#items in table)/(#addresses in table)
Load factor alpha measures how full the table
is
Average work required to do a search in a
hash table increases with alpha.
Linear Probing for Collision Resolution
Average #comparisons for successful search:
(1/2){1+ [1/(1-alpha)]}
Average #comparisons for un-successful
search: (1/2){1+ [1/sqr(1-alpha)]}
Both terms approach 1 as alpha-->0.
Both terms approach infinity as alpha-->1.
(obviously we lose precision when alpha gets
close to 1.)
When alpha=2/3, [1/(1-alpha)]=3,
[1/sqr(1-alpha)]=9, successful search averages
(1+3)/2=2 comparisons, and unsuccessful search
averages (1+9)/2=10 comparisons.
Quadratic probing and double hashing for Collision Resolution
Average #comparisons for successful
search: -ln(1-alpha)/alpha
Average #comparisons for un-successful
search: 1/(1-alpha).
Both terms approach 1 as alpha-->0.
Both terms approach infinity as alpha-->1.
(obviously we lose precision when
alpha gets close to 1.)
These functions are smaller than the
corresponding work functions for linear
probing.
The methods described are open addressing. They are
problematic when the number of keys cannot be
bounded.
Separate Chaining for Collision Resolution
alpha can be greater than 1.
alpha = (#items)/(#addresses) = average length
of a chain.
Average #comparisons for successful search:
1+(alpha/2) -- 1 probe plus the average
number of additional probes required in the
average non-empty list.
Average #comparisons for un-successful
search: alpha -- we have to probe until we
get to the end of the average list.
Advice: If you can afford a large table (small
alpha) and if you are willing to trade occasional
slowness of search for very fast average search,
then hashing may be the right choice.
Advice: If you are performing "life-and-death search
for your city's poison control center" a (balanced)
search-tree implementation will at least assure you
an O(logN) bound on the number of probes required.
Typically keys do not map into a hash table in any
semblance of key order and it is not possible to
efficiently implement a hash table operation that
traverses the table elements in key-order.
Practically speaking, you would have to sort the
items in a hash table to traverse the table in
sorted order. If that traversal operation is
necessary, think seriously about using a search tree
instead of a hash table.
When memory accesses are exclusively to primary
memory, the time savings gained by hashing is often
not significant. However, the savings may very well
be significant when accessing a very slow external
data device such as a disk.
Friday, December 01
Take Roll.
Check out this week's schedule
Announcement(s)
Discuss hashing assignment
If doing external chaining, counter can be length of list.
Calculating (alpha # k) / (k!) can be
"unstable" because (k!) gets very large very
quickly as k increases, and because (alpha # k)
can get either very large or very small very
quickly as k increases. (The latter depends on
whether the load factor alpha is greater than
one or less.
If alpha > 1 consider calculating it this way
(alpha/1)*(alpha/2) ... * (alpha/k). That way
the numerator and denominator tend to "cancel"
and there is less chance of underflow.
It is easy to make an error doing the math to
generate the information in the table. Test
with 800 keys and 1000 min table size to see if
you get the same output as in the assignment
sheet. The sample was actually generated by a
solution program.
Gotcha: remember that you need to do something
"fancier" than integer division to calculate
alpha.
Discuss "Birthday Surprise/Paradox", collisions, and
value of statistics that the hashing program
calculates.