Hash Tables With Open Addressing
August 24, 2012
One of the classic data structures of computer science is the hash table, which provides constant-time access to key/value data in the average case. The idea is to store a key/value pair in a location that is computed based on the value of the key, which works fine when all the hash values are unique; the problem arises when two keys hash to the same value. The hash tables of the Standard Prelude use a method called chaining to resolve collisions; today’s exercise uses a different method called open addressing.
All hash tables work by computing an address at which to store an item based on its key. Sometimes two keys hash to the same value, which causes a collision. Chaining, as used in the Standard Prelude, resolves collisions by storing multiple items at the same location, forming a list of items. Open addressing instead computes a second address, or if necessary a third, or fourth, or …, continuing until it finds an empty spot. It is necessary that the computation of secondary addresses eventually visits every possible storage location; a simple approach, which we will adopt, is called linear probing, in which storage locations are accessed in increasing order until an empty location is found. It is possible, of course, for the hash table to become completely filled, in which case an error is reported when a new item cannot be inserted.
The tricky part of hashing with open addressing is deletions, because it’s not possible to simply delete an item because some other item may rely on the the fact that its storage location was filled when the item was inserted. The solution is to have three types of items in a storage location: nil, which indicates that the storage location has never been used; deleted, which indicates that the storage location is currently empty but has been used in the past, and in use, for those storage locations that are currently occupied.
Your task is to write functions that maintain a hash table with open addressing. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.