Hash Tables With Open Addressing
April 30, 2019
We start with a hashing function on strings, which are commonly used for symbol tables and other applications. We parameterize the function so it can be used for both hashes of the double hash algorithm:
(define (hash parm key) (let loop ((ks (string->list key)) (h 0)) (if (null? ks) h (loop (cdr ks) (+ (* h parm) (char->integer (car ks)))))))
We will use 31 and 37 as the parameters for the two hash functions. Thus:
> (hash 31 "hello") 99162322 > (hash 37 "hello") 200180656
A hash table is just a vector, with each item initially 'empty
:
(define m 7) ; small for testing (define ht (make-vector m 'empty))
Now we look at linear probing:
(define (lookup ht eql? k) (let loop ((h (modulo (hash 31 k) m))) (let ((bucket (vector-ref ht h))) (cond ((eq? bucket 'empty) (list)) ((and (pair? bucket) (eql? (car bucket) k)) bucket) (else (loop (modulo (+ h 1) m)))))))
(define (insert! ht eql? k v) (let loop ((h (modulo (hash 31 k) m))) (let ((bucket (vector-ref ht h))) (cond ((or (eq? bucket 'empty) (eq? bucket 'deleted) (eql? (car bucket) k)) (vector-set! ht h (cons k v)) ht) (else (loop (modulo (+ h 1) m)))))))
(define (delete! ht eql? k) (let loop ((h (modulo (hash 31 k) m))) (let ((bucket (vector-ref ht h))) (cond ((eq? bucket 'empty) ht) ((and (pair? bucket) (eql? (car bucket) k)) (vector-set! ht h 'deleted) ht) (else (loop (modulo (+ h 1) m)))))))
(define (enlist ht) (let loop ((h 0) (xs (list))) (if (= h m) xs (let ((bucket (vector-ref ht h))) (if (pair? bucket) (loop (+ h 1) (cons bucket xs)) (loop (+ h 1) xs))))))
The code is straight forward but interesting. Notice how 'empty
and 'delete
appear or fail to appear in the various functions: lookup
looks for 'empty
or a key/value pair, insert!
looks for 'empty
, 'delete
or a key/value pair, delete!
looks for 'empty
or a key/value pair, and enlist
looks only for a key/value pair. Note also that insert!
treats 'empty
, 'deleted
and a matching key/value pair as the same thing; all mark a place where the requested key/value pair is to appear.
Changing these functions to use double hashing is easy. The variable we have called h becomes h1, we add another call the hash
to compute h2, which is the gap between successive probes, and the computation (+ h 1)
becomes (+ h1 h2)
; the computation of h2 must be modified so it never returns zero. We won’t show the code here, but you can see it at https://ideone.com/6xFGsc. Here are some examples; they work the same way using either linear probing or double hashing:
> (set! ht (insert! ht string=? "a" 1)) > (set! ht (insert! ht string=? "c" 3)) > (set! ht (insert! ht string=? "e" 5)) > (set! ht (insert! ht string=? "f" 6)) > (set! ht (insert! ht string=? "g" 7)) > (set! ht (insert! ht string=? "h" 8)) > (length (enlist ht)) 6 > (set! ht (delete! ht string=? "c")) > (set! ht (delete! ht string=? "g")) > (lookup ht string=? "a") ("a" . 1) > (lookup ht string=? "c") () > ht #(("h" . 8) deleted empty ("e" . 5) ("f" . 6) deleted ("a" . 1)) > (enlist ht) (("a" . 1) ("f" . 6) ("e" . 5) ("h" . 8))
You can run the program at https://ideone.com/6xFGsc.
In general, double hashing is preferable to linear probing, which has the problem that multiple keys that hash to the same bucket become clustered; double hashing does a better job of spreading them out, so it permits a higher load factor.
In Python. The full code is on ideone.