Hash Tables With Open Addressing
April 30, 2019
We have studied hash tables in several previous exercises, most of them of the “chaining” variety, where collisions are resolved by inserting each item in a linked list in the bucket to which it hashes. Today’s exercise looks at two different varieties of hash tables that use “open addressing,” in which all keys are stored in the buckets themselves, one key per bucket, with collisions resolved by moving them to a different bucket. We studied one type of open addressing in the exercise on cuckoo hashing, today we will see two others: linear probing and double hashing. Our hash tables consist of M memory addresses that hold N keys (or key/value pairs). Obviously, N must be less than M, and the load factor N M is critical to the performance of the algorithm; if the table becomes too loaded, search times increase dramatically. Open address hash tables work best when N can be predicted with some reliability, so an appropriate M can be chosen.
The linear probing algorithm hashes the key onto the range 0 .. M−1 and looks first in the indicated bucket. If the key is there, it is returned. If the bucket is empty, the key is not present in the table. If the bucket is non-empty, but the key doesn’t match, the search goes to the next bucket in order, wrapping around at the end of the table. Double hashing is similar, except that a second hash function determines the increment between probes. When double hashing, we must arrange that the increment is non-zero (otherwise the search will never visit any bucket except the first) and is co-prime to M (otherwise some buckets will never be visited), which is most easily arranged by making M prime.
Deletions require some care. We can’t just mark a bucket available when its key is deleted, because subsequent searches will miss keys that are beyond the newly-empty bucket. Instead, we mark the bucket with a special value showing that it is deleted, and consider that bucket to be non-empty but with a key that never matches.
Your task is to implement hash tables using linear probing and double hashing. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
In Python. The full code is on ideone.