Ordered Maps

June 12, 2012

Make-dict returns a newly-allocated, empty dictionary. It’s sole argument, the ordering predicate lt?, is saved and applied to all operations made using the dictionary. Dictionaries are perfectly functional; the result of an insert or other operation that modifies the dictionary is a new dictionary, and the old dictionary remains available (unless the new dictionary has been rebound over the top of it):

(define (make-dict lt?)

  ; define-generator syntax

  ; lightly modified avl-tree library

  (define-generator (make-gen t)
    (avl-for-each (lambda (k v) (yield (cons k v))) t)
    (do () (#f) (yield #f)))

  (define (new dict)
    (lambda (message . args) (dispatch dict message args)))

  (define (dispatch dict message args)
    (define (arity n)
      (if (not (= (length args) n)) (error 'dict "incorrect arity")))
    (case message
      ((empty? nil?) (arity 0) (nil? dict))
      ((lookup fetch get) (arity 1) (apply lookup dict args))
      ((insert store put) (arity 2) (new (apply insert dict args)))
      ((update) (arity 3) (new (apply update dict args)))
      ((delete remove) (arity 1) (new (apply delete dict args)))
      ((size count length) (arity 0) (size dict))
      ((nth) (arity 1) (apply nth dict args))
      ((rank) (arity 1) (apply rank dict args))
      ((map) (arity 1) (new (avl-map (car args) dict)))
      ((fold) (arity 2) (avl-fold (car args) (cadr args) dict))
      ((for-each) (arity 1) (avl-for-each (car args) dict))
      ((to-list enlist) (arity 0) (to-list dict))
      ((from-list) (arity 1) (new (apply from-list dict args)))
      ((make-gen gen) (arity 0) (make-gen dict))
      (else (error 'dict "invalid message"))))

  (vector-set! nil 2 nil) (vector-set! nil 3 nil) (new nil))

Our implementation of make-dict is easy because most of it is stolen. The define-generator syntax is copied unchanged from the earlier exercise on generators. The avl tree functions are from two previous exercises; the light modifications referred to remove the lt? predicate as an argument from each of the functions (because it is bound at the outer level of the dictionary and need not be carried forward with each function) and add an update function that was not provided previously. The other addition to the avl library is the make-gen function, which first generates the key/value pairs of the dictionary in order using avl-for-each then generates #f on each subsequent call; the odd-looking do is the standard Scheme idiom for a loop that runs forever, like while (1) in C.

The way that Scheme makes a data type abstract is to encapsulate it in a function. Thus, where insert and delete and the other functions return an avl-tree, new encloses that tree in a function. The dispatch function is called each time the dictionary is called, selects the requested operation based on the message, then either returns a simple value (for operations like size or fold) or a newly allocated dictionary (for operations like insert or delete). The function call (new nil) on the last line returns the initial nil tree. A sample sequence of operations is shown below:

> (define d (make-dict <))
> (set! d (d 'insert 4 4))
> (set! d (d 'from-list '((1 . 1) (3 . 3) (7 . 7) (6 . 6) (2 . 2) (5 . 5))))
> (d 'size)
> (d 'fold (lambda (k v b) (+ v b)) 0)
> (d 'nth 4)
(5 . 5)
> (d 'rank 2)
> (set! d (d 'delete 6))
> (set! d (d 'map (lambda (k v) (integer->char (+ v 64)))))
> (d 'enlist)
((1 . A) (2 . B) (3 . C) (4 . D) (5 . E) (7 . G))

As another example, we reprise the word-frequency problem that we previously solved using a hash table. We reuse the read-word function from that exercise and use take from the Standard Prelude, and write a function word-freq that returns the n words that occur most frequently in file-name in descending order:

(define (word-freq n file-name)
  (define (freq-gt? a b) (> (cdr a) (cdr b)))
  (with-input-from-file file-name (lambda ()
    (let ((freqs (make-dict string<?)))
      (do ((word (read-word) (read-word)))
          ((eof-object? word) (take n (sort freq-gt? (freqs 'enlist))))
        (freqs 'update word (lambda (k v) (+ v 1)) 1))))))

You can run the code at http://programmingpraxis.codepad.org/mn02cMq9, or see the code in the Standard Prelude. The Standard Prelude now provides two type of dictionaries, based on hash tables and avl trees. In cases that don’t required ordered keys, hash tables and avl trees can be used interchangeably. So which should you prefer? You might prefer the O(1) expected access time of hash tables to the O(log n) worst-case access time of avl trees, or you may fear the O(n) worst-case access time of hash tables, but in practice the two data types are equivalent. My own preference is generally for avl trees, because there is no nagging worry about a worst-case time complexity, because there is no annoyance at the delay when a small hash table is rehashed into a larger arena, and because more that once I have found, during the maintenance life of a program, that a new feature not foreseen when the program was first written requires the data in order, and it’s easier to start with the data in order than to retrofit order onto an existing hash table.

About these ads

Pages: 1 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 576 other followers

%d bloggers like this: