February 1, 2011
Hashing is a way of maintaining a dictionary data structure that provides insert, delete and lookup operations. The most common hashing method, called chained hashing, uses a single hash function to provide an offset into an array; each bucket contains a linked list, which is searched to find the desired item. This method is fast if the hash function does a good job of distributing the keys, with average access in constant time, but has an annoying linear-time worst case if all the keys hash to the same value.
An alternate hashing method, called cuckoo hashing, was invented by Rasmus Pagh and Flemming Friche Rodler. The advantage of cuckoo hashing is that it guarantees constant-time lookups, and its amortized-constant-time insertions are within a constant factor of optimal. The code that implements cuckoo hashing is simple, and cuckoo hashing performs very well in practice.
In cuckoo hashing, instead of storing a list of key/value pairs in a single bucket, each “nest” position in an array either has a single key/value pair, or is empty. And instead of a single hash function, there are two; any key must be found at one of the two nests, guaranteeing constant-time lookup. Insertion works by looking at the two possible nests; if either is empty, the new key/value pair is placed there, but if both are occupied, the item being inserted is placed in one of them and the displaced item is then inserted recursively in a different nest. If insertion ever encounters a cycle, the whole data structure is rehashed into a new one using two new hash functions; it is possible that rehashing encounters a new cycle, and so on, ad idfinitum, but in practice that almost never happens, and if it does, can be fixed by increasing the size of the array. The diagram at right shows a chain of key/value pairs; note the cycle between H and W. Cuckoo hashing derives its name from the behavior of the cuckoo bird that makes its nest by finding another bird’s nest and driving away its original occupant.
Several variants of cuckoo hashing have been described in the academic literature. The biggest disadvantage of cuckoo hashing is wasted space; the version described above requires a load factor of no more than 50% nests occupied by key/value pairs in order to guarantee amortized-constant-time insertions. Adding a third hash function increases the average fill rate from 50% to 91%, and adding additional hash functions increases that rate even more. Another possibility is to allow a fixed number of items, greater than one, in each slot of the hash table; increasing from one item to two increases the average fill rate to 80%. Of course, these ideas can be used together; using three “hatch” functions and two birds per nest increases the average fill rate to 97%, and even higher fill rates are possible, though at the cost of more work per item (but the big-oh time guarantees remain unchanged). Most real implementations of cuckoo hashing provide for the number of nests to change dynamically, growing and shrinking as the number of items in the table changes.
Your task is to write a library that maintains a dictionary of key/value pairs using cuckoo hashing; you should provide operators for the three basic operators (lookup, insert and delete) plus a function to extract the key/value pairs to a list. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.