Beautiful Code

September 11, 2009

We find it more convenient in Scheme to work with lists of characters than strings, which require arithmetic. The top-level function trex converts the regular expression and search text to lists, then calls the matcher:

(define (trex regex text) (match (string->list regex) (string->list text)))

Matching begins at the match function. If the regular expression is null, all previous elements of the regular expression matched successfully, and match returns #t to indicate a successful match. If the search text is exhausted before the end of the regular expression, the match fails, and match returns #f. If the match is anchored at the beginning of the text, match calls match-here to est if the regular expression matches the beginning of the text. Otherwise, match calls itself recursively at each position through the search text until if finds a match or exhausts the text.

(define (match regex text) (cond ((null? regex) #t) ((null? text) #f) ((char=? (car regex) #\^) (match-here (cdr regex) text)) (else (or (match-here regex text) (match regex (cdr text))))))

Match-here tests if the regular expression matches at the current position in the input text. If the regular expression is exhausted, the match succeeds, and match-here returns #t. If the regular expression is at least two characters long, and the second character is an asterisk, match-here calls match-star to match the closure. If the regular expression ends with a dollar sign, match-here returns #t if the text is empty (the null list) and #f otherwise. If the input text is not exhausted and the next character in the regular expression is a dot, which matches any character, or is the same as the next character in the text, match-here advances both the regular expression and the text to the next character and calls itself recursively. Otherwise, the match fails, and match-here returns #f.

(define (match-here regex text) (cond ((null? regex) #t) ((and (pair? (cdr regex)) (char=? (cadr regex) #\*)) (match-star (car regex) (cddr regex) text)) ((and (char=? (car regex) #\$) (null? (cdr regex))) (null? text)) ((and (pair? text) (or (char=? (car regex) #\.) (char=? (car regex) (car text)))) (match-here (cdr regex) (cdr text))) (else #f)))

Match-star implements closure, which is repetition of a pattern element, using a lazy strategy; it matches the least possible number of repetitions of the pattern element that lead to an overall match of the regular expression. The first cond clause tests for a null match and returns #t immediately. If the input text is not exhausted and the next character in the regular expression is a dot, which matches any character, or is the same as the next character in the text, match-star calls itself recursively to match the remaining text. Otherwise, match-star returns #f to indicate failure.

(define (match-star c regex text) (cond ((match-here regex text) #t) ((and (pair? text) (or (char=? (car text) c) (char=? c #\.))) (match-star c regex (cdr text))) (else #f)))

As Kernighan and Pike point out, the match, match-here and match-star procedures are mutually recursive, and it is instructive to trace the recursion on complicated regular expressions.

You can run the code at http://programmingpraxis.codepad.org/vv72J3XE, where you will also find a comprehensive test suite.

Posted by programmingpraxis

Filed in Exercises

7 Comments »

7 Responses to “Beautiful Code”

Programming Praxis – Beautiful Code « Bonsai Code said
September 11, 2009 at 5:12 PM
[…] Praxis – Beautiful Code By Remco Niemeijer Today’s Programming Praxis is about beautiful code. Specifically, it concerns a bit of C code that can […]

Remco Niemeijer said

September 11, 2009 at 5:12 PM

My Haskell solution (see http://bonsaicode.wordpress.com/2009/09/11/programming-praxis-beautiful-code/ for a version with comments):

import Data.List

match :: String -> String -> Bool
match ('^':r) = matchHere r
match r       = or . map (matchHere r) . tails

matchHere :: String -> String -> Bool
matchHere (c:'*':r) xs  = matchStar c r xs
matchHere "$"       xs  = null xs
matchHere (r:rs) (x:xs) = (r == '.' || r == x) && matchHere rs xs
matchHere r      _      = null r

matchStar :: Char -> String -> String -> Bool
matchStar _ r xs     | matchHere r xs = True
matchStar c r (x:xs) = (c == '.' || c == x) && matchStar c r xs
matchStar _ _ _      = False

Tordek said

September 12, 2009 at 6:10 AM

Assuming strings as lists of characters (They’re usually atoms, so they should be transformed):

Prolog solution.


match(['^'|Rs], T)          :- matchHere(Rs, T).
match(R, T)                 :- matchHere(R, T).
match(R, [_|Ts])            :- matchHere(R, Ts).

matchHere([], _).
matchHere(['$'], []).
matchHere([R,'*'|Rs], T)    :- matchStar(R, Rs, T).
matchHere([R|Rs], [R|Ts])   :- matchHere(Rs, Ts).
matchHere(['.'|Rs], [_|Ts]) :- matchHere(Rs, Ts).

matchStar(_, Rs, Ts)        :- matchHere(Rs, Ts).
matchStar(R, Rs, [R|Ts])    :- matchHere(Rs, Ts).
matchStar('.', Rs, [_|Ts])  :- matchHere(Rs, Ts).
matchStar(R, Rs, [R|Ts])    :- matchStar(R, Rs, Ts).
matchStar('.', Rs, [_|Ts])  :- matchStar('.', Rs, Ts).

Beautiful Code « Programming Praxis | what a beautiful day.. said
September 14, 2009 at 3:06 AM
[…] Beautiful Code « Programming Praxis. […]

Luis Sergio Oliveira said

October 22, 2009 at 6:50 AM

My Common Lisp solution:

(defun empty (seq)
  (declare (inline))
  (= 0 (length seq)))

(defun match (re text)
  "search for re anywhere in text"
  (if (and (not (empty re)) (char= #\^ (elt re 0)))
      (matchhere (subseq re 1) text)
      ;; must look at empty string
      (cond ((matchhere re text))
	    ((empty text)
	     nil)
	    ((match re (subseq text 1))))))

(defun matchhere (re text)
  "search for re at beginning of text"
  (cond ((empty re))
	((and (< 1 (length re)) (char= #\* (elt re 1)))
	 (matchstar (elt re 0) (subseq re 2) text))
	((and (char= #\$ (elt re 0)) (= 1 (length re)))
	 (empty text))
	((and (not (empty text))
	      (or (char= #\. (elt re 0)) (char= (elt re 0) (elt text 0))))
	 (matchhere (subseq re 1) (subseq text 1)))))

(defun matchstar (c re text)
  "search for c*re at beginning of text"
  (cond ((matchhere re text))
	((and (not (empty text))
	      (or (char= c (elt text 0)) (char= #\. c)))
	 (matchstar c re (subseq text 1)))))

Michel said
October 25, 2009 at 6:04 AM
My Clojure solution. Clojure uses Java strings, which are immutable, so taking substring would be too expensive. Instead, I made match-here and match-* internal functions, and give them indices rather than strings to work with.

Solution, on GitHub

Lautaro Pecile said

May 19, 2012 at 1:34 AM

from operator import truth as not_empty

is_empty = lambda s: not not_empty(s)

def iterate(f, string):
    s = string
    while not_empty(s):
        yield f(s)
        s = s[1:]

def match (re, text):    
    if re.startswith('^'):
        return matchhere(re[1:], text)
    return any(iterate(lambda t: matchhere(re, t), text))
        

def matchhere(re, text):
    if is_empty(re):
        return True
    if re.startswith('*', 1):
        return matchstar(re[0], re[2:], text)
    if re.startswith('$'):
        return is_empty(text)
    if (not_empty(text) and (re.startswith('.') or re[0] == text[0])):
        return matchhere(re[1:], text[1:])
    return False
    
def matchstar(c, re, text):
    while True:                
        if matchhere(re, text):
            return True
        text = text[1:]
        if is_empty(text) or not (text.startswith(c) or c == '.'):
            break
    return False

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Programming Praxis

Beautiful Code

September 11, 2009

7 Responses to “Beautiful Code”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Beautiful Code

September 11, 2009

Share this:

Related

7 Responses to “Beautiful Code”

Leave a comment

Categories

Archives

Archives