Homework 2: Hash functions
This assignment is to give you some practice with hash functions. You are
required to submit the programming part of the assignment in softcopy, and
the descriptive parts (Problem 1 and results of Problem 2) on paper.
Problem 1: Compression Maps (CLR 12.3-3)
Consider a version of the division method in which
h(k) = k mod m, where m = (2^p) - 1
and k is a character string interpreted in radix 2^p. Show that
if string x can be derived from string y by permuting its
characters, then x and y hash to the same value.
Problem 2: Expected Number of Collisions
If you have N buckets and a good (pseudorandom) hash function, the
probability of any two keys colliding is 1/N. So when you have k
keys in the table and insert key k + 1, the probability that the new
key does NOT collide with any old key is (1 - 1/N)^k. If you
insert n distinct items, the expected number that WON'T collide is:
sum (1 - 1/N)^k = N - N(1 - 1/N)^n
so the expected number of collisions is
n - N + N(1 - 1/N)^n.
We require you to implement a hash table (see support files). Your hashtable
should count the number of collisions for n elements. You should then
compare the number of collisions you get with the formula given above.
For example, if you have N=100 buckets
and n=100 keys, expect about 36.6 collisions.
In your written submission, report the number of collisions for the following
combinations, and compare them with the expected value. You should clearly
state the hash function you used. If you used different hash functions and
observed different behavior, report that too.
- N=100 buckets, n=100 keys
- N=100 buckets, n=50 keys
- N=100 buckets, n=200 keys
Compile outputs: outfile.compile
- words (input words : first 1k words from /usr/share/dict/words)
Run outputs: outfile.run