Longest Common Subsequence
June 9, 2009
Our solution is a straight forward expression of the classic algorithm; the lcs matrix is calculated top-to-bottom left-to-right in the two nested for
s, then the longest common subsequence is recovered in the named-let
loop:
(define (lcs eql? xs ys)
(let* ((x-len (length xs)) (y-len (length ys))
(x1 (+ x-len 1)) (y1 (+ y-len 1))
(xv (list->vector xs)) (yv (list->vector ys))
(m (make-matrix x1 y1)))
(for (x 0 x1)
(for (y 0 y1)
(cond ((or (zero? x) (zero? y))
(matrix-set! m x y 0))
((eql? (vector-ref xv (- x 1))
(vector-ref yv (- y 1)))
(matrix-set! m x y
(+ 1 (matrix-ref m (- x 1) (- y 1)))))
(else (matrix-set! m x y
(max (matrix-ref m (- x 1) y)
(matrix-ref m x (- y 1))))))))
(let loop ((x x-len) (y y-len) (zs '()))
(cond ((or (zero? x) (zero? y)) zs)
((= (matrix-ref m x y) (matrix-ref m (- x 1) y))
(loop (- x 1) y zs))
((= (matrix-ref m x y) (matrix-ref m x (- y 1)))
(loop x (- y 1) zs))
(else (loop (- x 1) (- y 1) (cons (vector-ref xv (- x 1)) zs)))))))
The matrix operators and for
macro come from the Standard Prelude. The calculation of the longest common subsequence of the strings PROGRAMMING and PRAXIS is shown below:
> (lcs char=? (string->list "PROGRAMMING") (string->list "PRAXIS"))
(P R A I)
You can run this program at http://programmingpraxis.codepad.org/MQK6CN9q.
My Haskell solution (see http://bonsaicode.wordpress.com/2009/06/09/programming-praxis-longest-common-subsequence/ for a version with comments):
import Data.Array
at :: Int -> Int -> Array Int (Array Int e) -> e
at x y a = a ! y ! x
lcs :: Eq a => [a] -> [a] -> [a]
lcs xs ys = reverse $ walk imax jmax where
imax = length xs
jmax = length ys
ax = listArray (1, imax) xs
ay = listArray (1, jmax) ys
ls = listArray (0, jmax) [listArray (0, imax)
[lcs’ i j | i <- [0..imax]] | j <- [0..jmax]] lcs' 0 _ = 0 lcs' _ 0 = 0 lcs' i j | ax ! i == ay ! j = 1 + at (i-1) (j-1) ls | otherwise = max (at (i-1) j ls) (at i (j-1) ls) walk 0 _ = [] walk _ 0 = [] walk i j | at (i-1) j ls == at i j ls = walk (i-1) j | at i (j-1) ls < at i j ls = ax ! i : walk i (j-1) | otherwise = walk i (j-1) main :: IO () main = print $ lcs "programming" "praxis" [/sourcecode]
[…] Praxis – Longest Common Subsequence By Remco Niemeijer Today’s Programming Praxis problem is about finding the longest common subsequence of two sequences. Our […]
My Python version. Incorporates a couple optimizations:
1. identifies matching sequences at start and end of the two input sequences, and only runs LCS algorithm on the middle portions of the input sequences.
2. a row of the matrix in the LCS algorithm only depends on the previous row, so only keep one previous row instead of entire matrix.
The LCS algorithm accumulates the indices of the members of the LCS. The reduce call at the end, assembles the LCS from the indices. The result has the same type (i.e., string, list, tuple) as seq1.
Ooooops… memory management bits me in the backside.
Lines 6 & 7 should read
7 #define element(M, i, j) (*((M)->elt + \
8 ((M)->width * (i) + (j))))
Nasty bug there, the sizeof(int) in the macro I’d initially written was completely uncalled for: gcc knows perfectly well already that (M)->elt is an int*. As a result, my code was writing the matrix to an area 4 times the size of what I’d malloc’d in init_matrix.
I’ve worked on a generic version of this in preparation for the 08 June 2010 exercise. I’ve packed all the generic functions in a str_util.c file. The headers:
the lcs_util.c file itself:
…and (finally) the string-specific bits:
[…] Longest Common Subsequence Finding the longest common subsequence of two sequences is a classic computer science problem with an equally classic solution that dates to the folklore of computing. The longest common subsequence is the longest set of elements that appear in order (not necessarily contiguous) in two sequences, the solution can be used in matching DNA sequences and it is also the basis of Unix “DIFF” utility. […]
my implementation in c
#include
int max(int a, int b){
return a>b?a:b;
}
void printResult( char * str1 , char * str2 ){
int len1=0,len2=0;
int i,j,count=1;
int a[20][20];
// find out the lengths
while(str1[len1++] != ” );
while(str2[len2++] != ” );
len1–;
len2–;
// populate the matrix
for ( i=0 ; i<=len2 ; i++ )
for ( j=0 ; j<=len1 ; j++ ){
if ( i==0 || j==0 ){
a[i][j] = 0;
continue;
}
if ( str1[j-1] == str2[i-1] )
a[i][j]=a[i-1][j-1] +1;
else
a[i][j]=max(a[i-1][j],a[i][j-1]);
}
// print the answer
for ( i=0 ; i<len1 ; i++ ){
if ( a[len2][i+1] == count ){
printf("%c",str1[i]);
count++;
}
}
printf("\n");
}
void main(){
int n,i;
char a[10][10];
char b[10][10];
// scan the number of elements
scanf("%d",&n);
// scan elements
for ( i=0 ; i<n ; i++ ){
scanf("%s",a[i]);
scanf("%s",b[i]);
}
for ( i=0 ; i<n ; i++ )
printResult(a[i],b[i]);
}