Beautiful Code

September 11, 2009

In their book The Practice of Programming, Brian Kernighan and Rob Pike present this code for matching simple regular expressions consisting of literal characters, a dot matching any character, a star consisting of zero or more repetitions of the preceding character, and a caret and a dollar sign representing the beginning or end of the search string:

/* match: search for re anywhere in text */ int match(char *re, char *text) { if (re[0] == '^') return matchhere(re+1, text); do { /* must look at empty string */ if (matchhere(re, text)) return 1; } while (*text++ != '\0'); return 0; }

/* matchhere: search for re at beginning of text */ int matchhere(char *re, char *text) { if (re[0] == '\0') return 1; if (re[1] == '*') return matchstar(re[0], re+2, text); if (re[0] == '$' && re[1] == '\0') return *text == '\0'; if (*text!='\0' && (re[0]=='.' || re[0]==*text)) return matchhere(re+1, text+1); return 0; }

/* matchstar: search for c*re at beginning of text */ int matchstar(int c, char *re, char *text) { do { /* a * matches zero or more instances */ if (matchhere(re, text)) return 1; } while (*text!='\0' && (*text++==c || c=='.')); return 0; }

Many readers commented on the beauty of the code, and in a later book, Beautiful Code by Andy Oram and Greg Wilson, Kernighan explained the history of the code.

Your task is to port the code to your favorite language. You should use the features and idioms of your language, while simultaneously preserving the beauty of Rob Pike’s regular expression matcher. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

7 Comments »

7 Responses to “Beautiful Code”

Programming Praxis – Beautiful Code « Bonsai Code said
September 11, 2009 at 5:12 PM
[…] Praxis – Beautiful Code By Remco Niemeijer Today’s Programming Praxis is about beautiful code. Specifically, it concerns a bit of C code that can […]

Remco Niemeijer said

September 11, 2009 at 5:12 PM

My Haskell solution (see http://bonsaicode.wordpress.com/2009/09/11/programming-praxis-beautiful-code/ for a version with comments):

import Data.List

match :: String -> String -> Bool
match ('^':r) = matchHere r
match r       = or . map (matchHere r) . tails

matchHere :: String -> String -> Bool
matchHere (c:'*':r) xs  = matchStar c r xs
matchHere "$"       xs  = null xs
matchHere (r:rs) (x:xs) = (r == '.' || r == x) && matchHere rs xs
matchHere r      _      = null r

matchStar :: Char -> String -> String -> Bool
matchStar _ r xs     | matchHere r xs = True
matchStar c r (x:xs) = (c == '.' || c == x) && matchStar c r xs
matchStar _ _ _      = False

Tordek said

September 12, 2009 at 6:10 AM

Assuming strings as lists of characters (They’re usually atoms, so they should be transformed):

Prolog solution.


match(['^'|Rs], T)          :- matchHere(Rs, T).
match(R, T)                 :- matchHere(R, T).
match(R, [_|Ts])            :- matchHere(R, Ts).

matchHere([], _).
matchHere(['$'], []).
matchHere([R,'*'|Rs], T)    :- matchStar(R, Rs, T).
matchHere([R|Rs], [R|Ts])   :- matchHere(Rs, Ts).
matchHere(['.'|Rs], [_|Ts]) :- matchHere(Rs, Ts).

matchStar(_, Rs, Ts)        :- matchHere(Rs, Ts).
matchStar(R, Rs, [R|Ts])    :- matchHere(Rs, Ts).
matchStar('.', Rs, [_|Ts])  :- matchHere(Rs, Ts).
matchStar(R, Rs, [R|Ts])    :- matchStar(R, Rs, Ts).
matchStar('.', Rs, [_|Ts])  :- matchStar('.', Rs, Ts).

Beautiful Code « Programming Praxis | what a beautiful day.. said
September 14, 2009 at 3:06 AM
[…] Beautiful Code « Programming Praxis. […]

Luis Sergio Oliveira said

October 22, 2009 at 6:50 AM

My Common Lisp solution:

(defun empty (seq)
  (declare (inline))
  (= 0 (length seq)))

(defun match (re text)
  "search for re anywhere in text"
  (if (and (not (empty re)) (char= #\^ (elt re 0)))
      (matchhere (subseq re 1) text)
      ;; must look at empty string
      (cond ((matchhere re text))
	    ((empty text)
	     nil)
	    ((match re (subseq text 1))))))

(defun matchhere (re text)
  "search for re at beginning of text"
  (cond ((empty re))
	((and (< 1 (length re)) (char= #\* (elt re 1)))
	 (matchstar (elt re 0) (subseq re 2) text))
	((and (char= #\$ (elt re 0)) (= 1 (length re)))
	 (empty text))
	((and (not (empty text))
	      (or (char= #\. (elt re 0)) (char= (elt re 0) (elt text 0))))
	 (matchhere (subseq re 1) (subseq text 1)))))

(defun matchstar (c re text)
  "search for c*re at beginning of text"
  (cond ((matchhere re text))
	((and (not (empty text))
	      (or (char= c (elt text 0)) (char= #\. c)))
	 (matchstar c re (subseq text 1)))))

Michel said
October 25, 2009 at 6:04 AM
My Clojure solution. Clojure uses Java strings, which are immutable, so taking substring would be too expensive. Instead, I made match-here and match-* internal functions, and give them indices rather than strings to work with.

Solution, on GitHub

Lautaro Pecile said

May 19, 2012 at 1:34 AM

from operator import truth as not_empty

is_empty = lambda s: not not_empty(s)

def iterate(f, string):
    s = string
    while not_empty(s):
        yield f(s)
        s = s[1:]

def match (re, text):    
    if re.startswith('^'):
        return matchhere(re[1:], text)
    return any(iterate(lambda t: matchhere(re, t), text))
        

def matchhere(re, text):
    if is_empty(re):
        return True
    if re.startswith('*', 1):
        return matchstar(re[0], re[2:], text)
    if re.startswith('$'):
        return is_empty(text)
    if (not_empty(text) and (re.startswith('.') or re[0] == text[0])):
        return matchhere(re[1:], text[1:])
    return False
    
def matchstar(c, re, text):
    while True:                
        if matchhere(re, text):
            return True
        text = text[1:]
        if is_empty(text) or not (text.startswith(c) or c == '.'):
            break
    return False

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Programming Praxis

Beautiful Code

September 11, 2009

7 Responses to “Beautiful Code”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Beautiful Code

September 11, 2009

Share this:

Related

7 Responses to “Beautiful Code”

Leave a comment

Categories

Archives

Archives