## Ordered Hash Tables

### August 2, 2013

We assume two functions `hash`

and `lt?`

are provided:

`(define (hash k) k)`

`(define (lt? k1 k2) (< k1 k2))`

A hash table is a vector of lists, all initially empty; the length of the vector is chosen based on the expected number of records to be stored, and is best prime (we make it small for testing):

`(define len 5)`

`(define ht (make-vector len (list)))`

The easiest place to start is the lookup function. The function performs linear search through the appropriate bucket and returns a null list to indicate failure in two places: if the bucket is exhausted, or if the key being sought is greater than the current key:

`(define (lookup k)`

(let* ((index (modulo (hash k) len))

(bucket (vector-ref ht index)))

(let loop ((bucket bucket))

(cond ((null? bucket) (list))

((lt? k (caar bucket)) (list))

((lt? (caar bucket) k) (loop (cdr bucket)))

(else (car bucket))))))

Insertion has a structure similar to lookup. The added loop parameter, `xs`

, collects bucket items as they are seen so they can be included in the final output:

`(define (insert k v)`

(let* ((index (modulo (hash k) len))

(bucket (vector-ref ht index)))

(let loop ((bucket bucket) (xs (list)))

(cond ((null? bucket)

(vector-set! ht index

(append (reverse xs) (list (cons k v)))))

((lt? k (caar bucket))

(vector-set! ht index

(append (reverse xs) (list (cons k v)) bucket)))

((lt? (caar bucket) k)

(loop (cdr bucket) (cons (car bucket) xs)))

(else (vector-set! ht index

(append (reverse xs) (list (cons k v)) (cdr bucket))))))))

Deletion uses the same four-way switch as lookup and insert;

in the first two cases, when the key is not present in the

hash table, it does nothing, leaving the hash table unchanged:

`(define (delete k)`

(let* ((index (modulo (hash k) len))

(bucket (vector-ref ht index)))

(let loop ((bucket bucket) (xs (list)))

(cond ((null? bucket))

((lt? k (caar bucket)))

((lt? (caar bucket) k)

(loop (cdr bucket) (cons (car bucket) xs)))

(else (vector-set! ht index

(append (reverse xs) (cdr bucket))))))))

We also provide a function to create a list of the key/value

pairs in the hash table:

`(define (enlist)`

(let ((xs (list)))

(do ((i 0 (+ i 1))) ((= i len) xs)

(set! xs (append (vector-ref ht i) xs)))))

We test the hash table functions by building a small database,

asserting various facts about it, deleting all the entries in

the database, then checking that it is empty; note the

difference in the two `range`

s, by which we test

deletion of a non-existent key:

`(define alphabet '(alpha bravo charlie delta echo foxtrot golf`

hotel india juliet kilo lima mike november oscar papa quebec

romeo sierra tango uniform victor whiskey xray yankee zulu))

`(define (test)`

(let ((xs (shuffle (map cons (range 1 27) alphabet))))

(do ((xs xs (cdr xs))) ((null? xs))

(insert (caar xs) (cdar xs))))

(assert (length (enlist)) 26)

(assert (cdr (lookup 3)) 'charlie)

(assert (cdr (lookup 17)) 'quebec)

(assert (lookup 43) '())

(let ((xs (shuffle (range 27))))

(do ((xs xs (cdr xs))) ((null? xs))

(delete (car xs))))

(assert (length (enlist)) 0)

(assert (lookup 3) '())

(assert (lookup 17) '())

(assert (lookup 43) '()))

`> (test)`

; no news is good news

We used `range`

, `rand`

, `randint`

, `shuffle`

and `assert`

from the Standard Prelude, but only for testing; the hash-table functions are self-contained. You can run the program at http://programmingpraxis.codepad.org/ae17jWwR.

Pages: 1 2

Since you have a comparison function, you could go further and have each bucket be an auto-growing array; this would allow lookup to use a binary search within the bucket. But at that point it might be better to re-evaluate the hash function.

A simple Haskell solution: http://hamberg.no/erlend/posts/2013-08-02-ordered-hash-table.html

Can we implement the solution in Java?

Here’s a solution in Common Lisp. The operations on the sorted alist could be improved by using tail recursion.

An argument can be used to specify the number of buckets. Dynamic resizing is not implemented.

This solution uses separate chaining with linked lists, although a search tree would seemingly improve performance.