Word Hy-phen-a-tion By Com-pu-ter
April 24, 2009
We will follow Liang’s algorithm closely, but use the tries of a previous exercise instead of Liang’s packed tries. We store the patterns and exceptions in lists (they are symbols, not strings, because that’s how Liang’s listing is given):
(define patterns '(.ach4 .ad4der .af1t ... zte4 4z1z2 z4zy))
(define exceptions '(as-so-ciate ... ta-ble))
Our first task is to split a pattern like hen5at
into the key henat
and rules 000500
; note the use of symbol->string
:
(define (split-pat pat)
(let loop ((ps (string->list (symbol->string pat))) (cs '()) (ns '()))
(cond ((null? ps)
(if (= (length cs) (length ns))
(values (reverse cs) (reverse (cons 0 ns)))
(values (reverse cs) (reverse ns))))
((and (null? (cdr ps)) (char-numeric? (car ps)))
(values (reverse cs)
(reverse (cons (- char->integer (car ps)) 48) ns))))
((not (char-numeric? (car ps)))
(loop (cdr ps) (cons (car ps) cs) (cons 0 ns)))
(else (loop (cddr ps)
(cons (cadr s) cs)
(cons (- (char->integer (car ps)) 48) ns))))))
Exceptions are handled by making them un-exceptional: an exception like ta-ble
is turned into a pattern like 6.6t6a7b6l6e6.6
and entered into the trie along with the regular patterns. Because 7 and 6 are greater than the other hyphenation and inhibition rules, and because the pattern is anchored with a dot at beginning and end, the exception rules will always trump any other rules that may exist in the word. Turning the exceptions into patterns means they don’t have to be handled specially, so the program can be simpler. Here is the code to split an exception into pattern and key:
(define (split-exn exn)
(let loop ((ps (append (list #\.) (string->list (symbol->string exn)) (list #\.)))
(cs '()) (ns '()))
(cond ((null? ps) (values (reverse cs) (reverse (cons 6 ns))))
((char=? (car ps) #\-)
(loop (cddr ps) (cons (cadr ps) cs) (cons 7 ns)))
(else (loop (cdr ps) (cons (car ps) cs) (cons 6 ns))))))
T-hyph
is the trie that stores all the key/rule pairs, built by looping through the lists of patterns and exceptions:
(define t-hyph
(let loop ((pats patterns) (exns exceptions) (t t-null))
(cond ((null? exns) t)
((null? pats)
(call-with-values (lambda () (split-exn (car exns)))
(lambda (cs ns) (loop pats (cdr exns) (t-bind cs ns t)))))
(else (call-with-values (lambda () (split-pat (car pats)))
(lambda (cs ns) (loop (cdr pats) exns (t-bind cs ns t))))))))
The standard trie returns the single rule associated with a key. We need instead to get a list of all the rules along the path from a substring; for instance, when examining the substring henation.
, we find the three patterns he2n
, hena4
, and hen5at
. T-looks
is similar to t-look
, but returns a list, possibly null, instead of a single value:
(define (t-looks ks t)
(if (null? ks)
(if (pair? (car t)) (list (caar t)) '())
(let ((x (a-look (car ks) (cdr t))))
(if x
(if (pair? (car t))
(cons (caar t) (t-looks (cdr ks) (cdr x)))
(t-looks (cdr ks) (cdr x)))
(if (pair? (car t)) (list (caar t)) '())))))
We are ready now to show the function that performs hyphenation. Hyphenate
takes a single word, as a string, and returns a list of substrings between hyphenation points; for instance, (hyphenate "praxis")
returns ("prax" "is")
.
The outer loop iterates over the letters of the input word, surrounded by dots. The rules are stored in a two-part list, the parts before the current letter and the part after the current letter. When the input word is exhausted, the substrings are collected in the first inner loop. Otherwise, the fold-left in the second inner loop saves the maximum rule at each position for each pattern returned by t-looks
.
Hyphenate
looks complicated, but examined in pieces it’s not too bad:
(define (hyphenate word)
(if (< (string-length word) 5) (list word)
(let loop ((ws (append (list #\.) (string->list word) (list #\.)))
(front (make-list (+ (string-length word) 3) 0)) (back '()))
(if (null? ws)
(let loop ((cs (string->list word)) (hs (fixup back)) (p '()) (ps '()))
(cond ((null? (cdr hs)) (reverse (cons (list->string (reverse p)) ps)))
((odd? (car hs))
(loop (cdr cs) (cdr hs) (list (car cs))
(cons (list->string (reverse p)) ps)))
(else (loop (cdr cs) (cdr hs) (cons (car cs) p) ps))))
(let ((new-front (fold-left max-rule front (t-looks ws t-hyph))))
(loop (cdr ws) (cdr new-front) (cons (car new-front) back)))))))
Hyphenate
calls two helper functions: max-rule
is called by fold-left
to loop through the rules for a particular pattern, saving only the maximum at each step, and fixup
implements the requirement that a word cannot begin or end with a single-letter substring:
(define (max-rule xs ys)
(let loop ((xs xs) (ys ys) (zs '()))
(if (or (null? xs) (null? ys))
(append (reverse zs) xs)
(loop (cdr xs) (cdr ys) (cons (max (car xs) (car ys)) zs)))))
(define (fixup xs)
(cons 0 (cons 0 (cdddr (reverse (cons 0 (cons 0 (cddr xs))))))))
Here is a peek at hyphenate
in action:
> (hyphenate "hyphenation")
("hy" "phen" "ation")
Fold-left
and make-list
come from the Standard Prelude. You can run this program at http://programmingpraxis.codepad.org/s0N597UX.
[…] Praxis – Word hypenation By Remco Niemeijer Today’s Programming Praxis problem is about word hyphenation. Let’s see what we can come up […]
My Haskell solution (see http://bonsaicode.wordpress.com/2009/04/24/programming-praxis-word-hypenation/ for the commented version):
The standard solution doesn’t follow the requirement that any word containing a non-alphabetic character is never hyphenated. Here is a version that does:
(define (new-hyphenate word)
(let ((ws (string->list word)))
(cond ((< (length ws) 5) (list word))
((any? (lambda (c) (not (char-alphabetic? c))) ws) (list word))
(else (let loop ((ws (append (list #\.) ws (list #\.)))
(front (make-list (+ (length ws) 3) 0)) (back '()))
(if (null? ws)
(let loop ((cs (string->list word)) (hs (fixup back)) (p '()) (ps '()))
(cond ((null? (cdr hs)) (reverse (cons (list->string (reverse p)) ps)))
((odd? (car hs))
(loop (cdr cs) (cdr hs) (list (car cs))
(cons (list->string (reverse p)) ps)))
(else (loop (cdr cs) (cdr hs) (cons (car cs) p) ps))))
(let ((new-front (fold-left max-rule front (t-looks ws t-hyph))))
(loop (cdr ws) (cdr new-front) (cons (car new-front) back))))))))))
This function requires
any?
, which will soon be added to the Standard Prelude:(define (any? pred? xs)
(cond ((null? xs) #f)
((pred? (car xs)) #t)
(else (any? pred? (cdr xs)))))