## Ternary Search Tries

### June 5, 2009

Several of our exercises (Mark V. Shaney, Word Frequencies, Dodgson’s Doublets, Anagrams) have used hash tables based on string keys. When the keys are strings, a useful alternative to hash tables is a data structure called a ternary search trie, which can be faster than a hash table because it takes advantage of the fact that keys are strings, and also provides in-order access to the keys. Ternary search tries were introduced by Jon Bentley and Robert Sedgewick at the 1997 SODA conference.

As the name implies, ternary search tries are a cross between binary search trees and radix tries. Ternary search tries consist of recursive nodes and leaves, just like binary search trees and radix tries. As the name implies, a node of a ternary search trie has three children, one for lesser children, one for greater children, and a third for equal children. Searches proceed character-by-character, as with a trie. At each level of the trie, the search compares the current character of the search string with the character stored in the node. If the search character is less, the search continues at the less-than child, and if the search character is greater, the search continues at the greater-than child. However, when the two characters are equal, the search continues recursively at the equal-to child, proceeding to the next character in the search string.

The picture below, taken from the Bentley/Sedgewick SODA paper, shows a ternary search trie containing twelve two-character words:

Your task is to implement the lookup, insert, update, delete and enlist operations on ternary search tries. When you are finished, you are welcome to read or run a suggested solution, or post your solution or discuss the exercise in the comments below.

Pages: 1 2

[…] Praxis – Ternary Search Tries By Remco Niemeijer Today’s Programming Praxis problem is about Ternary search tries, which are basically hashmaps of strings […]

My Haskell solution (see http://bonsaicode.wordpress.com/2009/06/05/programming-praxis-ternary-search-tries/ for a version with comments):

import Data.Char

import qualified Data.List.Key as K

import Prelude hiding (lookup)

data TernaryTrie k v = Empty | Node { val :: Maybe v,

split :: [k], lb :: !(TernaryTrie k v),

eb :: !(TernaryTrie k v), gb :: !(TernaryTrie k v) }

lookup :: Ord k => [k] -> TernaryTrie k v -> Maybe v

lookup _ Empty = Nothing

lookup [] t = val t

lookup (x:xs) t = case compare [x] $ split t of

GT -> lookup (x:xs) $ gb t

LT -> lookup (x:xs) $ lb t

EQ -> lookup xs $ eb t

insert :: Ord k => [k] -> v -> TernaryTrie k v -> TernaryTrie k v

insert k v Empty = insert k v $

Node Nothing (take 1 k) Empty Empty Empty

insert [] v t = t { val = Just v }

insert k v t = modify (flip insert v) k t

update :: Ord k => [k] -> v -> (v -> v) ->

TernaryTrie k v -> TernaryTrie k v

update k v _ Empty = insert k v Empty

update [] v p t = t { val = Just . maybe v p $ val t }

update k v p t = modify (\x -> update x v p) k t

delete :: Ord k => [k] -> TernaryTrie k v -> TernaryTrie k v

delete _ Empty = Empty

delete [] t = t { val = Nothing }

delete k t = modify delete k t

modify :: Ord k => ([k] -> TernaryTrie k v -> TernaryTrie k v) ->

[k] -> TernaryTrie k v -> TernaryTrie k v

modify f k t = case compare (take 1 k) (split t) of

LT -> t { lb = f (drop 0 k) $ lb t }

EQ -> t { eb = f (drop 1 k) $ eb t }

GT -> t { gb = f (drop 0 k) $ gb t }

enlist :: TernaryTrie k v -> [([k], v)]

enlist = enlist’ [] where

enlist’ _ Empty = []

enlist’ k t =

maybe [] (\v -> [(k, v)]) (val t) ++ enlist’ k (lb t) ++

enlist’ (k ++ split t) (eb t) ++ enlist’ k (gb t)

main :: IO ()

main = print . take 25 . reverse . K.sort snd . enlist .

foldl (\t k -> update k 1 succ t) Empty .

map (map toLower . filter isAlpha) . words =<< readFile "bible.txt" [/sourcecode]

implemented in C

You can save a third of the storage in Scheme (or any dynamically typed language) by imposing a trivial constraint: the value can’t be a character. By using a unique object to represent “no value” (I personally like to use (string-copy “something meaningful”), which is guaranteed to generate a new string object)), then you can use one slot for the Boolean, the value, and the split character.