Ternary Search Tries

June 5, 2009

Several of our exercises (Mark V. Shaney, Word Frequencies, Dodgson’s Doublets, Anagrams) have used hash tables based on string keys. When the keys are strings, a useful alternative to hash tables is a data structure called a ternary search trie, which can be faster than a hash table because it takes advantage of the fact that keys are strings, and also provides in-order access to the keys. Ternary search tries were introduced by Jon Bentley and Robert Sedgewick at the 1997 SODA conference.

As the name implies, ternary search tries are a cross between binary search trees and radix tries. Ternary search tries consist of recursive nodes and leaves, just like binary search trees and radix tries. As the name implies, a node of a ternary search trie has three children, one for lesser children, one for greater children, and a third for equal children. Searches proceed character-by-character, as with a trie. At each level of the trie, the search compares the current character of the search string with the character stored in the node. If the search character is less, the search continues at the less-than child, and if the search character is greater, the search continues at the greater-than child. However, when the two characters are equal, the search continues recursively at the equal-to child, proceeding to the next character in the search string.

The picture below, taken from the Bentley/Sedgewick SODA paper, shows a ternary search trie containing twelve two-character words:

Ternary Search Trie

Your task is to implement the lookup, insert, update, delete and enlist operations on ternary search tries. When you are finished, you are welcome to read or run a suggested solution, or post your solution or discuss the exercise in the comments below.

Pages: 1 2

4 Responses to “Ternary Search Tries”

  1. […] Praxis – Ternary Search Tries By Remco Niemeijer Today’s Programming Praxis problem is about Ternary search tries, which are basically hashmaps of strings […]

  2. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2009/06/05/programming-praxis-ternary-search-tries/ for a version with comments):

    import Data.Char
    import qualified Data.List.Key as K
    import Prelude hiding (lookup)

    data TernaryTrie k v = Empty | Node { val :: Maybe v,
    split :: [k], lb :: !(TernaryTrie k v),
    eb :: !(TernaryTrie k v), gb :: !(TernaryTrie k v) }

    lookup :: Ord k => [k] -> TernaryTrie k v -> Maybe v
    lookup _ Empty = Nothing
    lookup [] t = val t
    lookup (x:xs) t = case compare [x] $ split t of
    GT -> lookup (x:xs) $ gb t
    LT -> lookup (x:xs) $ lb t
    EQ -> lookup xs $ eb t

    insert :: Ord k => [k] -> v -> TernaryTrie k v -> TernaryTrie k v
    insert k v Empty = insert k v $
    Node Nothing (take 1 k) Empty Empty Empty
    insert [] v t = t { val = Just v }
    insert k v t = modify (flip insert v) k t

    update :: Ord k => [k] -> v -> (v -> v) ->
    TernaryTrie k v -> TernaryTrie k v
    update k v _ Empty = insert k v Empty
    update [] v p t = t { val = Just . maybe v p $ val t }
    update k v p t = modify (\x -> update x v p) k t

    delete :: Ord k => [k] -> TernaryTrie k v -> TernaryTrie k v
    delete _ Empty = Empty
    delete [] t = t { val = Nothing }
    delete k t = modify delete k t

    modify :: Ord k => ([k] -> TernaryTrie k v -> TernaryTrie k v) ->
    [k] -> TernaryTrie k v -> TernaryTrie k v
    modify f k t = case compare (take 1 k) (split t) of
    LT -> t { lb = f (drop 0 k) $ lb t }
    EQ -> t { eb = f (drop 1 k) $ eb t }
    GT -> t { gb = f (drop 0 k) $ gb t }

    enlist :: TernaryTrie k v -> [([k], v)]
    enlist = enlist’ [] where
    enlist’ _ Empty = []
    enlist’ k t =
    maybe [] (\v -> [(k, v)]) (val t) ++ enlist’ k (lb t) ++
    enlist’ (k ++ split t) (eb t) ++ enlist’ k (gb t)

    main :: IO ()
    main = print . take 25 . reverse . K.sort snd . enlist .
    foldl (\t k -> update k 1 succ t) Empty .
    map (map toLower . filter isAlpha) . words =<< readFile "bible.txt" [/sourcecode]

  3. Vikas Tandi said

    implemented in C

    #include <stdlib.h>
    
    typedef struct TernaryTrieNode
    {
    	char c;
    	struct TernaryTrieNode *left;
    	struct TernaryTrieNode *mid;
    	struct TernaryTrieNode *right;
    }TernaryTrie;
    
    static TernaryTrie* create_node(char c);
    static int TernaryTrie_search_imp(TernaryTrie *p, char *s, int pos);
    static TernaryTrie* TernaryTrie_insert_imp(TernaryTrie *p, char *s, int pos);
    
    TernaryTrie* TernaryTrie_init()
    {
    	TernaryTrie *head;
    
    	head = create_node(0);
    	if(head == NULL)
    		return NULL;
    
    	head->left = head->right = head;
    	return head;
    }
    
    int TernaryTrie_search(TernaryTrie *p, char *s)
    {
    	if(s == NULL || s[0] == '\0')
    		return 0;
    	return TernaryTrie_search_imp(p->mid, s, 0);
    }
    
    static int TernaryTrie_search_imp(TernaryTrie *p, char *s, int pos)
    {
    	if(p == NULL)
    		return 0;
    	if(s[pos] == '\0')
    		return 1;
    	if(s[pos] < p->c)
    		return TernaryTrie_search_imp(p->left, s, pos);
    	if(s[pos] == p->c)
    		return TernaryTrie_search_imp(p->mid, s, pos+1);
    	if(s[pos] > p->c)
    		return TernaryTrie_search_imp(p->right, s, pos);
    }
    
    TernaryTrie* TernaryTrie_insert(TernaryTrie *p, char *s)
    {
    	if(s == NULL || s[0] == '\0')
    		return p;
    	p->mid = TernaryTrie_insert_imp(p->mid, s, 0);
    	return p;
    }
    
    static TernaryTrie* TernaryTrie_insert_imp(TernaryTrie *p, char *s, int pos)
    {
    	if(p == NULL)
    	{
    		p = create_node(s[pos]);
    		if(p == NULL)
    			return NULL;
    	}
    	if(s[pos] == '\0')
    		return p;
    
    	if(s[pos] < p->c)
    		p->left = TernaryTrie_insert_imp(p->left, s, pos);
    	if(s[pos] == p->c)
    		p->mid = TernaryTrie_insert_imp(p->mid, s, pos+1);
    	if(s[pos] > p->c)
    		p->right = TernaryTrie_insert_imp(p->right, s, pos);
    
    	return p;
    }
    
    static TernaryTrie* create_node(char c)
    {
    	TernaryTrie *p;
    
    	p = (TernaryTrie*)malloc(sizeof(*p));
    	if(p == NULL)
    		return NULL;
    
    	p->c = c;
    	p->left = p->right = p->mid = NULL;
    	return p;
    }
    
  4. John Cowan said

    You can save a third of the storage in Scheme (or any dynamically typed language) by imposing a trivial constraint: the value can’t be a character. By using a unique object to represent “no value” (I personally like to use (string-copy “something meaningful”), which is guaranteed to generate a new string object)), then you can use one slot for the Boolean, the value, and the split character.

Leave a comment