## AVL Trees

### November 25, 2011

We begin with some simple access functions. Note that the height of a node is always calculated automatically whenever a node is created, which greatly simplifies all of the other functions:

`(define (tree k v l r) (vector k v l r (+ (max (ht l) (ht r)) 1)))`

(define (key t) (vector-ref t 0))

(define (val t) (vector-ref t 1))

(define (lkid t) (vector-ref t 2))

(define (rkid t) (vector-ref t 3))

(define (ht t) (vector-ref t 4))

(define (bal t) (- (ht (lkid t)) (ht (rkid t))))

(define nil (vector 'nil 'nil 'nil 'nil 0))

(define (nil? t) (eq? t nil))

We have discussed rotations in several previous exercises. Note that a rotation has no effect on the binary search tree property, but since it changes the heights of the various nodes it does have an effect on the AVL balance property:

`(define (rot-left t)`

(if (nil? t) t

(tree (key (rkid t))

(val (rkid t))

(tree (key t) (val t) (lkid t) (lkid (rkid t)))

(rkid (rkid t)))))

`(define (rot-right t)`

(if (nil? t) t

(tree (key (lkid t))

(val (lkid t))

(lkid (lkid t))

(tree (key t) (val t) (rkid (lkid t)) (rkid t)))))

The key to the AVL tree implementation is the `balance`

function, which takes a node in an AVL tree that is possibly imbalanced, though its children are known to be balanced, and returns a new tree that is balanced by performing the needed rotations. The only trees that need action are those with a balance with absolute value greater than 1; all others are simply returned unchanged. There are two symmetrical cases, each with two possibilities, which are fixed by either a single rotation or a double rotation:

`(define (balance t)`

(let ((b (bal t)))

(cond ((and (= b 2) (= (bal (lkid t)) -1))

(rot-right (tree (key t) (val t) (rot-left (lkid t)) (rkid t))))

((= b 2) (rot-right t))

((and (= b -2) (= (bal (rkid t)) 1))

(rot-left (tree (key t) (val t) (lkid t) (rot-right (rkid t)))))

((= b -2) (rot-left t))

(else t))))

`Lookup`

is exactly the same as the corresponding function on regular (unbalanced) binary search trees:

`(define (lookup lt? t k)`

(cond ((nil? t) #f)

((lt? k (key t)) (lookup lt? (lkid t) k))

((lt? (key t) k) (lookup lt? (rkid t) k))

(else (cons k (val t)))))

`Insert`

searches a tree recursively in the same way as `lookup`

, building a new tree as it goes. There are two base cases: if the search finds the key already in the tree, it simply replaces the existing value, and if the search reaches a `nil`

tree, it simply inserts a new node. Otherwise, insertion branches either left or right, inserting the new key/value pair recursively in the proper subtree. `Balance`

is called at each step, but since the existing tree already possesses the AVL balance property, each of those calls to balance returns immediately without performing any rotations, so that only one single- or double-rotation is performed, if necessary, at the parent of the new node in the tree:

`(define (insert lt? t k v)`

(cond ((nil? t) (tree k v nil nil))

((lt? k (key t))

(balance (tree (key t) (val t) (insert lt? (lkid t) k v) (rkid t))))

((lt? (key t) k)

(balance (tree (key t) (val t) (lkid t) (insert lt? (rkid t) k v))))

(else (tree k v (lkid t) (rkid t)))))

`Delete`

performs a similar search to `lookup`

and `insert`

, and also builds a new tree as it goes; a consequence of this arrangement is that if the key is not present in the tree, new nodes will be created along the search path, though they will be identical to the original nodes. If it reaches a nil tree without finding a match, it stops. If the key being deleted is in one of the children of the current tree, delete calls itself recursively, using `balance`

but no rotations. The interesting case occurs when `delete`

finds a matching key, when it uses the auxiliary function `delete-successor`

to delete the current node and replace it with its immediate successor. `Delete-successor`

plunges down the left spine of the right subtree, stacking up each node, and rebalancing as it unwinds the stack after it reaches bottom; the `balance`

function only needs to perform rotations until it reaches a balanced node:

`(define (delete lt? t k)`

(define (delete-successor t)

(cond ((nil? (lkid t)) (values (rkid t) (key t) (val t)))

(else (call-with-values

(lambda () (delete-successor (lkid t)))

(lambda (l k v)

(values (balance (tree (key t) (val t) l (rkid t))) k v))))))

(let delete ((t t))

(cond ((nil? t) nil)

((lt? k (key t))

(balance (tree (key t) (val t) (delete (lkid t)) (rkid t))))

((lt? (key t) k)

(balance (tree (key t) (val t) (lkid t) (delete (rkid t)))))

((nil? (lkid t)) (rkid t))

((nil? (rkid t)) (lkid t))

(else (call-with-values

(lambda () (delete-successor (rkid t)))

(lambda (r k v) (balance (tree k v (lkid t) r))))))))

`Enlist`

performs in-order traversal of the tree:

`(define (enlist t)`

(cond ((nil? t) (list))

((and (nil? (lkid t)) (nil? (rkid t)))

(list (cons (key t) (val t))))

(else (append (enlist (lkid t))

(list (cons (key t) (val t)))

(enlist (rkid t))))))

And that’s it. About a page of code. And if it’s not simple, at least it’s not too complex. The balancing code makes AVL trees about half-again as long as simple (unbalanced) binary search trees, which is a small cost compared to the benefit of balance. Here’s an example:

`> (define t (insert < nil 4 4))`

> (set! t (insert < t 1 1))

> (set! t (insert < t 3 3))

> (set! t (insert < t 5 5))

> (set! t (insert < t 2 2))

> (enlist t)

((1 . 1) (2 . 2) (3 . 3) (4 . 4) (5 . 5))

> (lookup < t 3)

(3 . 3)

> (lookup < t 9)

#f

> (set! t (delete < t 3))

> (set! t (delete < t 2))

> (set! t (delete < t 5))

> (set! t (delete < t 9))

> (set! t (delete < t 4))

> (set! t (delete < t 1))

> (enlist t)

()

Let’s move on to testing the AVL tree library. We first provide a function that displays a tree in human-readable form; it’s not needed for testing, but it is surely handy when examining small trees during manual testing:

`(define (display-avl t . indent)`

(let ((indent (if (null? indent) "" (car indent))))

(if (nil? t)

(begin (display indent) (display "nil nil 0") (newline))

(begin (display indent) (display (key t))

(display " ") (display (val t))

(display " ") (display (ht t)) (newline)

(display-avl (lkid t) (string-append " " indent))

(display-avl (rkid t) (string-append " " indent))))))

The `check?`

function takes a tree and checks that it is a valid AVL tree. A `nil`

tree is obviously valid. The `and`

has five clauses: the first two check the binary search tree property that the left child is less than the current node and the right child is greater than the current node; the third clause checks the AVL balance condition that the heights of the two child trees differ by no more than one, and the last two clauses check that the two child trees are valid AVL trees:

`(define (check? lt? t)`

(if (nil? t) #t

(and (or (nil? (lkid t)) (lt? (key (lkid t)) (key t)))

(or (nil? (rkid t)) (lt? (key t) (key (rkid t))))

(< (abs (bal t)) 2)

(check? lt? (lkid t))

(check? lt? (rkid t)))))

Our automated test first inserts the non-negative integers less than *n* into a nil tree in random order, calling `check?`

at each step to ensure the validity of the tree. Once all the integers have been inserted, the height is checked against the theoretical maximum, and the `enlist`

function checks that the tree contains all and only the non-negative integers less than *n*. Then each non-negative integer less than *n* is looked up, to ensure it is in the tree, and *n* is looked up to ensure that it is not. Next, each non-negative integer less than *n* is deleted, again in random order, calling `check?`

at each step to ensure the validity of the tree. Finally, the tree is checked to ensure it is nil. Here’s the code:

`(define (test n)`

(let ((t nil) (phi (/ (+ 1 (sqrt 5)) 2)))

(do ((xs (shuffle (range n)) (cdr xs))) ((null? xs))

(set! t (insert < t (car xs) (car xs)))

(assert (check? < t) #t))

(assert (< (ht t) (- (/ (log (+ n 2)) (log phi)) 1)) #t)

(assert (enlist t) (map (lambda (x) (cons x x)) (range n)))

(do ((xs (range n) (cdr xs))) ((null? xs))

(assert (car (lookup < t (car xs))) (car xs))

(assert (cdr (lookup < t (car xs))) (car xs)))

(assert (lookup < t n) #f)

(do ((xs (shuffle (range n)) (cdr xs))) ((null? xs))

(set! t (delete < t (car xs)))

(assert (check? < t) #t))

(assert t nil)))

Note that our implementation is purely functional. No nodes are overwritten. Instead, new nodes are created, sharing structure with unchanged nodes. You can run the program at http://programmingpraxis.codepad.org/SSrdpsMw.