Ordered Maps
June 12, 2012
Make-dict
returns a newly-allocated, empty dictionary. It’s sole argument, the ordering predicate lt?
, is saved and applied to all operations made using the dictionary. Dictionaries are perfectly functional; the result of an insert
or other operation that modifies the dictionary is a new dictionary, and the old dictionary remains available (unless the new dictionary has been rebound over the top of it):
(define (make-dict lt?)
; define-generator syntax
; lightly modified avl-tree library
(define-generator (make-gen t)
(avl-for-each (lambda (k v) (yield (cons k v))) t)
(do () (#f) (yield #f)))
(define (new dict)
(lambda (message . args) (dispatch dict message args)))
(define (dispatch dict message args)
(define (arity n)
(if (not (= (length args) n)) (error 'dict "incorrect arity")))
(case message
((empty? nil?) (arity 0) (nil? dict))
((lookup fetch get) (arity 1) (apply lookup dict args))
((insert store put) (arity 2) (new (apply insert dict args)))
((update) (arity 3) (new (apply update dict args)))
((delete remove) (arity 1) (new (apply delete dict args)))
((size count length) (arity 0) (size dict))
((nth) (arity 1) (apply nth dict args))
((rank) (arity 1) (apply rank dict args))
((map) (arity 1) (new (avl-map (car args) dict)))
((fold) (arity 2) (avl-fold (car args) (cadr args) dict))
((for-each) (arity 1) (avl-for-each (car args) dict))
((to-list enlist) (arity 0) (to-list dict))
((from-list) (arity 1) (new (apply from-list dict args)))
((make-gen gen) (arity 0) (make-gen dict))
(else (error 'dict "invalid message"))))
(vector-set! nil 2 nil) (vector-set! nil 3 nil) (new nil))
Our implementation of make-dict
is easy because most of it is stolen. The define-generator
syntax is copied unchanged from the earlier exercise on generators. The avl tree functions are from two previous exercises; the light modifications referred to remove the lt?
predicate as an argument from each of the functions (because it is bound at the outer level of the dictionary and need not be carried forward with each function) and add an update
function that was not provided previously. The other addition to the avl library is the make-gen
function, which first generates the key/value pairs of the dictionary in order using avl-for-each
then generates #f
on each subsequent call; the odd-looking do
is the standard Scheme idiom for a loop that runs forever, like while (1)
in C.
The way that Scheme makes a data type abstract is to encapsulate it in a function. Thus, where insert
and delete
and the other functions return an avl-tree, new
encloses that tree in a function. The dispatch
function is called each time the dictionary is called, selects the requested operation based on the message, then either returns a simple value (for operations like size
or fold
) or a new
ly allocated dictionary (for operations like insert
or delete
). The function call (new nil)
on the last line returns the initial nil tree. A sample sequence of operations is shown below:
> (define d (make-dict <))
> (set! d (d 'insert 4 4))
> (set! d (d 'from-list '((1 . 1) (3 . 3) (7 . 7) (6 . 6) (2 . 2) (5 . 5))))
> (d 'size)
7
> (d 'fold (lambda (k v b) (+ v b)) 0)
28
> (d 'nth 4)
(5 . 5)
> (d 'rank 2)
1
> (set! d (d 'delete 6))
> (set! d (d 'map (lambda (k v) (integer->char (+ v 64)))))
> (d 'enlist)
((1 . A) (2 . B) (3 . C) (4 . D) (5 . E) (7 . G))
As another example, we reprise the word-frequency problem that we previously solved using a hash table. We reuse the read-word
function from that exercise and use take from the Standard Prelude, and write a function word-freq
that returns the n words that occur most frequently in file-name in descending order:
(define (word-freq n file-name)
(define (freq-gt? a b) (> (cdr a) (cdr b)))
(with-input-from-file file-name (lambda ()
(let ((freqs (make-dict string<?)))
(do ((word (read-word) (read-word)))
((eof-object? word) (take n (sort freq-gt? (freqs 'enlist))))
(freqs 'update word (lambda (k v) (+ v 1)) 1))))))
You can run the code at http://programmingpraxis.codepad.org/mn02cMq9, or see the code in the Standard Prelude. The Standard Prelude now provides two type of dictionaries, based on hash tables and avl trees. In cases that don’t required ordered keys, hash tables and avl trees can be used interchangeably. So which should you prefer? You might prefer the O(1) expected access time of hash tables to the O(log n) worst-case access time of avl trees, or you may fear the O(n) worst-case access time of hash tables, but in practice the two data types are equivalent. My own preference is generally for avl trees, because there is no nagging worry about a worst-case time complexity, because there is no annoyance at the delay when a small hash table is rehashed into a larger arena, and because more that once I have found, during the maintenance life of a program, that a new feature not foreseen when the program was first written requires the data in order, and it’s easier to start with the data in order than to retrofit order onto an existing hash table.