## Ternary Search Tries

### June 5, 2009

We represent a ternary search trie as a collection of nodes, each a six-slot vector: a boolean indicator whether or not the node has a value, the value itself, the *split character* that splits the trie into less-than, equal-to, and greater-than parts, and three child nodes:

`(define (node v? v s l e h) (vector v? v s l e h))`

(define nil (vector #f #f (integer->char 0) (vector) (vector) (vector)))

(define (nil? tst) (eqv? tst nil))

(define (val? tst) (vector-ref tst 0))

(define (val tst) (vector-ref tst 1))

(define (split tst) (vector-ref tst 2))

(define (lokid tst) (vector-ref tst 3))

(define (eqkid tst) (vector-ref tst 4))

(define (hikid tst) (vector-ref tst 5))

We begin with `lookup`

, which searches for an item in a ternary search trie. `Lookup`

is recursive, with two base cases — a nil trie (failure) and a null key (success, if the current node contains a value) — and three recursive cases, one each for less-than, greater-than, and equal-to:

`(define (lookup t k)`

(cond ((nil? t) #f)

((null? k) (if (val? t) (val t) #f))

((char<? (car k) (split t)) (lookup (lokid t) k))

((char<? (split t) (car k)) (lookup (hikid t) k))

(else (lookup (eqkid t) (cdr k)))))

`Insert`

builds a new trie, so it is fully applicative; the cases are the same as for `lookup`

. The nil case is interesting; `insert`

builds a new node, then calls itself recursively without changing any of its arguments, so that at the next call the trie will have the node it needs to continue:

`(define (insert t k v)`

(cond ((nil? t) (insert (node #f #f (if (null? k) (integer->char 0) (car k)) nil nil nil) k v))

((null? k) (node #t v (split t) (lokid t) (eqkid t) (hikid t)))

((char<? (car k) (split t)) (node (val? t) (val t) (split t) (insert (lokid t) k v) (eqkid t) (hikid t)))

((char<? (split t) (car k)) (node (val? t) (val t) (split t) (lokid t) (eqkid t) (insert (hikid t) k v)))

(else (node (val? t) (val t) (split t) (lokid t) (insert (eqkid t) (cdr k) v) (hikid t)))))

`Update`

is a simple variant on `insert`

:

`(define (update t k p v)`

(cond ((nil? t) (update (node #f #f (if (null? k) (integer->char 0) (car k)) nil nil nil) k p v))

((null? k) (if (val? t) (node #t (p k (val t)) (split t) (lokid t) (eqkid t) (hikid t))

(node #t v (split t) (lokid t) (eqkid t) (hikid t))))

((char<? (car k) (split t)) (node (val? t) (val t) (split t) (update (lokid t) k p v) (eqkid t) (hikid t)))

((char<? (split t) (car k)) (node (val? t) (val t) (split t) (lokid t) (eqkid t) (update (hikid t) k p v)))

(else (node (val? t) (val t) (split t) (lokid t) (update (eqkid t) (cdr k) p v) (hikid t)))))

`Delete`

is interesting. If the key isn’t already in the trie, the nil case will eventually stop the recursion without changing the trie. If the key is in the trie, the null case sets the value indicator to `#f`

. This means that an `insert`

followed immediately by a `delete`

of the just-inserted key will potentially leave the trie in a different state than when it started, as the “ghost” structure of the deleted nodes remains. It is possible, though at considerable cost, to remove the ghost nodes; we don’t bother:

`(define (delete t k)`

(cond ((nil? t) t)

((null? k) (node #f #f (split t) (lokid t) (eqkid t) (hikid t)))

((char<? (car k) (split t)) (node (val? t) (val t) (split t) (delete (lokid t) k) (eqkid t) (hikid t)))

((char<? (split t) (car k)) (node (val? t) (val t) (split t) (lokid t) (eqkid t) (delete (hikid t) k)))

(else (node (val? t) (val t) (split t) (lokid t) (delete (eqkid t) (cdr k)) (hikid t)))))

`Enlist`

traverses the trie in order:

`(define (enlist t)`

(let enlist ((t t) (k '()))

(if (nil? t) '()

(append (enlist (lokid t) k)

(if (val? t)

(cons (cons (list->string (reverse k)) (val t))

(enlist (eqkid t) (cons (split t) k)))

(enlist (eqkid t) (cons (split t) k)))

(enlist (hikid t) k)))))

An example of the use of ternary search tries is given by the earlier word-frequencies exercise. Here is a restatement of the central function of that exercise:

`(define (word-freq n file-name)`

(define (freq-gt? a b) (> (cdr a) (cdr b)))

(with-input-from-file file-name

(lambda ()

(let loop ((word (read-word)) (freqs nil))

(if (eof-object? word)

(take n (sort freq-gt? (enlist freqs)))

(loop (read-word) (update freqs (string->list word) (lambda (k v) (+ v 1)) 1)))))))

Take is from the Standard Prelude, read-word is from the earlier exercise. When called with all its declarations, `word-freq`

produces the same output as the earlier exercise:

`> (word-freq 25 "bible.txt")`

(("the" . 62588) ("and" . 30875) ("of" . 30183)

("to" . 23023) ("you" . 14887) ("in" . 13357) ("he" . 10495)

("a" . 10150) ("i" . 9078) ("for" . 8983) ("his" . 8424)

("lord" . 8129) ("your" . 7398) ("with" . 7259)

("that" . 7187) ("is" . 7143) ("they" . 7005) ("not" . 6484)

("him" . 6140) ("will" . 6093) ("them" . 5831) ("be" . 5668)

("who" . 5611) ("from" . 5476) ("it" . 5395))

You can see all the code at http://programmingpraxis.codepad.org/UnTljjnx.

[…] Praxis – Ternary Search Tries By Remco Niemeijer Today’s Programming Praxis problem is about Ternary search tries, which are basically hashmaps of strings […]

My Haskell solution (see http://bonsaicode.wordpress.com/2009/06/05/programming-praxis-ternary-search-tries/ for a version with comments):

import Data.Char

import qualified Data.List.Key as K

import Prelude hiding (lookup)

data TernaryTrie k v = Empty | Node { val :: Maybe v,

split :: [k], lb :: !(TernaryTrie k v),

eb :: !(TernaryTrie k v), gb :: !(TernaryTrie k v) }

lookup :: Ord k => [k] -> TernaryTrie k v -> Maybe v

lookup _ Empty = Nothing

lookup [] t = val t

lookup (x:xs) t = case compare [x] $ split t of

GT -> lookup (x:xs) $ gb t

LT -> lookup (x:xs) $ lb t

EQ -> lookup xs $ eb t

insert :: Ord k => [k] -> v -> TernaryTrie k v -> TernaryTrie k v

insert k v Empty = insert k v $

Node Nothing (take 1 k) Empty Empty Empty

insert [] v t = t { val = Just v }

insert k v t = modify (flip insert v) k t

update :: Ord k => [k] -> v -> (v -> v) ->

TernaryTrie k v -> TernaryTrie k v

update k v _ Empty = insert k v Empty

update [] v p t = t { val = Just . maybe v p $ val t }

update k v p t = modify (\x -> update x v p) k t

delete :: Ord k => [k] -> TernaryTrie k v -> TernaryTrie k v

delete _ Empty = Empty

delete [] t = t { val = Nothing }

delete k t = modify delete k t

modify :: Ord k => ([k] -> TernaryTrie k v -> TernaryTrie k v) ->

[k] -> TernaryTrie k v -> TernaryTrie k v

modify f k t = case compare (take 1 k) (split t) of

LT -> t { lb = f (drop 0 k) $ lb t }

EQ -> t { eb = f (drop 1 k) $ eb t }

GT -> t { gb = f (drop 0 k) $ gb t }

enlist :: TernaryTrie k v -> [([k], v)]

enlist = enlist’ [] where

enlist’ _ Empty = []

enlist’ k t =

maybe [] (\v -> [(k, v)]) (val t) ++ enlist’ k (lb t) ++

enlist’ (k ++ split t) (eb t) ++ enlist’ k (gb t)

main :: IO ()

main = print . take 25 . reverse . K.sort snd . enlist .

foldl (\t k -> update k 1 succ t) Empty .

map (map toLower . filter isAlpha) . words =<< readFile "bible.txt" [/sourcecode]

implemented in C

You can save a third of the storage in Scheme (or any dynamically typed language) by imposing a trivial constraint: the value can’t be a character. By using a unique object to represent “no value” (I personally like to use (string-copy “something meaningful”), which is guaranteed to generate a new string object)), then you can use one slot for the Boolean, the value, and the split character.