## Hash Tables With Open Addressing

### April 30, 2019

We start with a hashing function on strings, which are commonly used for symbol tables and other applications. We parameterize the function so it can be used for both hashes of the double hash algorithm:

(define (hash parm key) (let loop ((ks (string->list key)) (h 0)) (if (null? ks) h (loop (cdr ks) (+ (* h parm) (char->integer (car ks)))))))

We will use 31 and 37 as the parameters for the two hash functions. Thus:

> (hash 31 "hello") 99162322 > (hash 37 "hello") 200180656

A hash table is just a vector, with each item initially `'empty`

:

(define m 7) ; small for testing (define ht (make-vector m 'empty))

Now we look at linear probing:

(define (lookup ht eql? k) (let loop ((h (modulo (hash 31 k) m))) (let ((bucket (vector-ref ht h))) (cond ((eq? bucket 'empty) (list)) ((and (pair? bucket) (eql? (car bucket) k)) bucket) (else (loop (modulo (+ h 1) m)))))))

(define (insert! ht eql? k v) (let loop ((h (modulo (hash 31 k) m))) (let ((bucket (vector-ref ht h))) (cond ((or (eq? bucket 'empty) (eq? bucket 'deleted) (eql? (car bucket) k)) (vector-set! ht h (cons k v)) ht) (else (loop (modulo (+ h 1) m)))))))

(define (delete! ht eql? k) (let loop ((h (modulo (hash 31 k) m))) (let ((bucket (vector-ref ht h))) (cond ((eq? bucket 'empty) ht) ((and (pair? bucket) (eql? (car bucket) k)) (vector-set! ht h 'deleted) ht) (else (loop (modulo (+ h 1) m)))))))

(define (enlist ht) (let loop ((h 0) (xs (list))) (if (= h m) xs (let ((bucket (vector-ref ht h))) (if (pair? bucket) (loop (+ h 1) (cons bucket xs)) (loop (+ h 1) xs))))))

The code is straight forward but interesting. Notice how `'empty`

and `'delete`

appear or fail to appear in the various functions: `lookup`

looks for `'empty`

or a key/value pair, `insert!`

looks for `'empty`

, `'delete`

or a key/value pair, `delete!`

looks for `'empty`

or a key/value pair, and `enlist`

looks only for a key/value pair. Note also that `insert!`

treats `'empty`

, `'deleted`

and a matching key/value pair as the same thing; all mark a place where the requested key/value pair is to appear.

Changing these functions to use double hashing is easy. The variable we have called *h* becomes *h1*, we add another call the `hash`

to compute *h2*, which is the gap between successive probes, and the computation `(+ h 1)`

becomes `(+ h1 h2)`

; the computation of *h2* must be modified so it never returns zero. We won’t show the code here, but you can see it at https://ideone.com/6xFGsc. Here are some examples; they work the same way using either linear probing or double hashing:

> (set! ht (insert! ht string=? "a" 1)) > (set! ht (insert! ht string=? "c" 3)) > (set! ht (insert! ht string=? "e" 5)) > (set! ht (insert! ht string=? "f" 6)) > (set! ht (insert! ht string=? "g" 7)) > (set! ht (insert! ht string=? "h" 8)) > (length (enlist ht)) 6 > (set! ht (delete! ht string=? "c")) > (set! ht (delete! ht string=? "g")) > (lookup ht string=? "a") ("a" . 1) > (lookup ht string=? "c") () > ht #(("h" . 8) deleted empty ("e" . 5) ("f" . 6) deleted ("a" . 1)) > (enlist ht) (("a" . 1) ("f" . 6) ("e" . 5) ("h" . 8))

You can run the program at https://ideone.com/6xFGsc.

In general, double hashing is preferable to linear probing, which has the problem that multiple keys that hash to the same bucket become clustered; double hashing does a better job of spreading them out, so it permits a higher load factor.

In Python. The full code is on ideone.