Cuckoo Hashing
February 1, 2011
To allow us to focus on the cuckoo algorithm, we restrict ourselves to only string-valued keys. Here is our hash function:
(define (string-hash str x)
(let loop ((cs (string->list str)) (h 0))
(if (null? cs) h
(loop (cdr cs) (+ (* h x)
(char->integer (car cs)))))))
This is the same as the string-hash
function in the Standard Prelude, except that it uses a variable multiplier x instead of a constant multiplier 31. To make a family of hash functions we will store the two multipliers in global variables x1 and x2, initialize them to 31 and 37, and increase them to the next-larger prime number when necessary:
(define (reset-multipliers)
(set! x1 (next-prime x2))
(set! x2 (next-prime x1)))
Lookup is simple: calculate the first hash, and return the requested item if it is found, otherwise calculate the second hash, and return the requested item if it is found, otherwise return failure. We store the key/value pairs as cons pairs and return the null list for failure:
(define (lookup table key)
(let ((h (modulo (string-hash key x1) table-size)))
(if (and (pair? (vector-ref table h))
(string=? (car (vector-ref table h)) key))
(vector-ref table h)
(let ((h (modulo (string-hash key x2) table-size)))
(if (and (pair? (vector-ref table h))
(string=? (car (vector-ref table h)) key))
(vector-ref table h)
'())))))
Insertion works as described above, except that if both possible nests are occupied, we always throw out the contents of the first nest; doing so makes it somewhat more likely that only one hash function will need to be calculated during lookup. Instead of checking for cycles, as suggested above, we simply stop and rehash if we have made more than max-probes steps:
(define (insert table key value)
(define (hash key x) (modulo (string-hash key x) table-size))
(if (pair? (lookup table key))
(let ((h1 (hash key x1)) (h2 (hash key x2)))
(if (= (car (vector-ref table h1)) key)
(begin (vector-set! table h1 (cons key value)) table)
(begin (vector-set! table h2 (cons key value)) table)))
(let loop ((key key) (value value) (count max-probes))
(let ((h1 (hash key x1)) (h2 (hash key x2)))
(cond ((null? (vector-ref table h1))
(vector-set! table h1 (cons key value)) table)
((null? (vector-ref table h2))
(vector-set! table h2 (cons key value)) table)
((zero? count)
(set! table (rehash table))
(insert table key value))
(else (let ((t (vector-ref table h1)))
(vector-set! table h1 (cons key value))
(loop (car t) (cdr t) (- count 1)))))))))
Rehashing is simply a matter of re-inserting all the key/value pairs in one table in a newly-allocated table using a new pair of hash functions:
(define (rehash table)
(reset-multipliers)
(let loop ((new-table (make-vector table-size '())) (i 0))
(if (= i table-size) new-table
(let ((t (vector-ref table i)))
(if (pair? t)
(loop (insert new-table (car t) (cdr t)) (+ i 1))
(loop new-table (+ i 1)))))))
Deletion is simple; just lookup the requested item, then replace it with the null list:
(define (delete table key)
(let ((h (modulo (string-hash key x1) table-size)))
(if (and (pair? (vector-ref table h))
(string=? (car (vector-ref table h)) key))
(vector-set! table h '())
(let ((h (modulo (string-hash key x2) table-size)))
(if (and (pair? (vector-ref table h))
(string=? (car (vector-ref table h)) key))
(vector-set! table h '())))))
table)
Now that we know how the functions work, we can write the make-hash
function that creates a new hash table; it’s often better to work this way, so that you know the steady-state working of the code before you do the initialization. The sole argument is max-size, which is the maximum number of key/value pairs permitted in the table; the size of the underlying array is the smallest prime number more than twice the requested maximum size, so the array never gets too full. We make the table size and multipliers prime to reduce the number of collisions. X1 and x2 are initialized to 31 and 37, max-probes is initialized based on the size of the table, and empty items are initialized to the null list:
(define x1 #f)
(define x2 #f)
(define table-size #f)
(define max-probes #f)
(define (make-hash max-size)
(set! x1 31) (set! x2 37)
(set! table-size (next-prime (* 2 max-size)))
(set! max-probes (max (* (ilog 2 max-size) 2) 20))
(make-vector table-size '()))
We collect the contents of the hash table in a list by indexing through the table taking all non-null items:
(define (enlist table)
(let loop ((i 0) (xs '()))
(if (= i table-size) xs
(let ((t (vector-ref table i)))
(loop (+ i 1) (if (pair? t) (cons t xs) xs))))))
We test cuckoo hashing using a phonetic alphabet:
(define words '("alpha" "bravo" "charlie" "delta" "echo"
"foxtrot" "golf" "hotel" "india" "juliet" "kilo" "lima"
"mike" "november" "oscar" "papa" "quebec" "romeo" "sierra"
"tango" "uniform" "victor" "whiskey" "xray" "yankee" "zulu"))
(define (cuckoo-test)
(let ((t (make-hash 25)))
(assert (lookup t "praxis") '())
(let loop ((words words) (val 1))
(when (pair? words)
(set! t (insert t (car words) val))
(loop (cdr words) (+ val 1))))
(assert (lookup t "praxis") '())
(assert (cdr (lookup t "papa")) 16)
(set! t (delete t "papa"))
(assert (lookup t "papa") '())))
With a max-size of 25, we rehashed the table once when inserting “lima” and twice when inserting “whiskey;” that’s what happens when the table gets full. With max-size of 30, no rehashes are required.
We used ilog
and assert
from the Standard Prelude and prime?
from a previous exercise, and wrote a simple next-prime
function that appears in the collected code at http://programmingpraxis.codepad.org/LIYEw8xo.
Took me a bit, but I think I cracked this one finally.
In order to make it really solid, I should go back to previous exercises and rework my prime stuff.
Of course, it’s not entirely true to the spirit of the exercise since I used a dictionary to implement it…
It would have been better to build a list of
None
s of the proper length and go from there.I’ve changed my answer, building on lists instead of dictionaries; the code is
posted on codepad.
I also modified my
ilog2()
so it uses only integer operationsinstead of just taking the log base 2 and rounding down to an integer.