Beautiful Code
September 11, 2009
In their book The Practice of Programming, Brian Kernighan and Rob Pike present this code for matching simple regular expressions consisting of literal characters, a dot matching any character, a star consisting of zero or more repetitions of the preceding character, and a caret and a dollar sign representing the beginning or end of the search string:
/* match: search for re anywhere in text */
int match(char *re, char *text)
{
if (re[0] == '^')
return matchhere(re+1, text);
do { /* must look at empty string */
if (matchhere(re, text))
return 1;
} while (*text++ != '\0');
return 0;
}
/* matchhere: search for re at beginning of text */
int matchhere(char *re, char *text)
{
if (re[0] == '\0')
return 1;
if (re[1] == '*')
return matchstar(re[0], re+2, text);
if (re[0] == '$' && re[1] == '\0')
return *text == '\0';
if (*text!='\0' && (re[0]=='.' || re[0]==*text))
return matchhere(re+1, text+1);
return 0;
}
/* matchstar: search for c*re at beginning of text */
int matchstar(int c, char *re, char *text)
{
do { /* a * matches zero or more instances */
if (matchhere(re, text))
return 1;
} while (*text!='\0' && (*text++==c || c=='.'));
return 0;
}
Many readers commented on the beauty of the code, and in a later book, Beautiful Code by Andy Oram and Greg Wilson, Kernighan explained the history of the code.
Your task is to port the code to your favorite language. You should use the features and idioms of your language, while simultaneously preserving the beauty of Rob Pike’s regular expression matcher. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.