Most Common Item In A Binary Search Tree
July 18, 2017
This is a trick question: there is no O(n) solution. There can’t be because we need to look at all n keys in the tree, and accessing each key requires log n time in the best case (assuming the tree is balanced), so the best time complexity is O(n log n). So that’s what we implement.
We begin with a binary search tree that admits duplicate keys; it is a variant on the binary search tree of a previous exercise that did not admit duplicate keys:
(define (tree key lkid rkid) (vector key lkid rkid)) (define (key tree) (vector-ref tree 0)) (define (lkid tree) (vector-ref tree 1)) (define (rkid tree) (vector-ref tree 2)) (define nil (vector 'nil 'nil 'nil)) (vector-set! nil 1 nil) (vector-set! nil 2 nil) (define (nil? tree) (eqv? tree nil)) (define (insert lt? t k) (cond ((nil? t) (tree k nil nil)) ((lt? k (key t)) (tree (key t) (insert lt? (lkid t) k) (rkid t))) ((lt? (key t) k) (tree (key t) (lkid t) (insert lt? (rkid t) k))) (else (tree (key t) (insert lt? (lkid t) k) (rkid t)))))
We build a tree for testing:
> (define t (insert < nil 1)) > (set! t (insert < t 2)) > (set! t (insert < t 3)) > (set! t (insert < t 3)) > (set! t (insert < t 3)) > (set! t (insert < t 4)) > (set! t (insert < t 5)) > (set! t (insert < t 5)) > (set! t (insert < t 5)) > (set! t (insert < t 5)) > (set! t (insert < t 5)) > (set! t (insert < t 5)) > (set! t (insert < t 6)) > (set! t (insert < t 6)) > (set! t (insert < t 7)) > (set! t (insert < t 7)) > (set! t (insert < t 7))
Once the tree is built, we convert it to a list, make a frequency count with uniq-c
, and find the largest count with maximum-by
:
(define (enlist t) (if (nil? t) (list) (append (enlist (lkid t)) (list (key t)) (enlist (rkid t)))))
(define (uniq-c eql? xs) (if (null? xs) xs (let loop ((xs (cdr xs)) (prev (car xs)) (k 1) (result '())) (cond ((null? xs) (reverse (cons (cons prev k) result))) ((eql? (car xs) prev) (loop (cdr xs) prev (+ k 1) result)) (else (loop (cdr xs) (car xs) 1 (cons (cons prev k) result)))))))
(define (maximum-by lt? . xs) (let loop ((xs (cdr xs)) (current-max (car xs))) (cond ((null? xs) current-max) ((lt? current-max (car xs)) (loop (cdr xs) (car xs))) (else (loop (cdr xs) current-max)))))
Now we can find the most common item in a tree:
> (apply maximum-by (lambda (x y) (< (cdr x) (cdr y))) (uniq-c = (enlist t))) (5 . 6)
So the most common item is 5, which appears 6 times.
That takes O(n) space instead of O(1). We could use a generator instead, but having already broken the time bounds, we may as well take the easy approach of breaking the space bounds, too.
You can run the program at http://ideone.com/vGoAeF.
Assuming that the keys are ordered in such a way that equal keys will all be consecutive in an inorder traversal (as they would be for integers), doesn’t that suggest “inorder traversal -> find longest run of consecutive identical items”?
I think that would be something like pseudo-Haskell “maximumBy length . group . inorder” (tweaked to use O(1) space if necessary).
It seems to me that O(n) is possible. Walk the tree (O(n)) and count the occurences of all values (using a hash table).
We can certainly compute the desired result in O(n) time with a simple in-order traversal (or treefoldl), and a fixed set of variables (current,best) x (value,count). Where that falls down is in requiring O(depth(tree)) stack space. I suspect that we can get that down to O(1) space if we are allowed to smash the tree (flattening it to a list on the fly), but I am not sure. There is certainly no need for a hash table.
[…] written about binary search trees on several occasions, most recently here. Binary search trees are a fruitful source of exercises, and we have another one […]
Here’s a solution in Java. Morris traversal is used for in-order traversal (a recursive solution would require more than O(1) space). The sorted order of values from the in-order traversal is relied upon for tracking the mode.
Example output follows the code in my last post.
My implementation with O(n) time and O(1) space in common lisp: