Longest Duplicated Substring
December 14, 2010
Function longest-duplicated-substring
implements the basic algorithm. Loop1
builds the suffix list, and loop2
runs through the sorted suffix list looking for the longest common prefix:
(define (longest-duplicated-substring str)
(let ((n (string-length str)))
(let loop1 ((i (- n 1)) (ss '()))
(if (<= 0 i)
(loop1 (- i 1) (cons (substring str i n) ss))
(let loop2 ((ss (sort string<? ss)) (dup "") (maxlen 0))
(cond ((null? (cdr ss)) (substring dup 0 maxlen))
((< maxlen (common-length (car ss) (cadr ss)))
(loop2 (cdr ss) (car ss)
(common-length (car ss) (cadr ss))))
(else (loop2 (cdr ss) dup maxlen))))))))
The length of the common prefix of two strings is computed by comparing character-by-character until there is a mismatch:
(define (common-length s1 s2)
(let* ((len1 (string-length s1))
(len2 (string-length s2))
(max-i (min len1 len2)))
(let loop ((i 0))
(cond ((= i max-i) i)
((char=? (string-ref s1 i) (string-ref s2 i))
(loop (+ i 1)))
(else i)))))
Here’s an example:
> (longest-common-subsequence "banana")
"ana"
We used sort from the Standard Prelude. You can run the program at http://programmingpraxis.codepad.org/dY3aGxru.
By the way, this is a real task with real utility. Biologists use this program to examine DNA sequences that are millions of characters long. In that case, the string is so long that the suffix list can’t fit into memory, so disk-backed storage must be used during the generation and sort phases of the program.
[…] today’s Programming Praxis exercise, our task is to implement the algorithm to find the longest duplicated […]
My Haskell solution (see http://bonsaicode.wordpress.com/2010/12/14/programming-praxis-longest-duplicated-substring/ for a version with comments):
Mine in F#
Correction,
need solution in Core Java
My Python version. It turns out to be very similar to Remco’s Haskell code. The combination of starmap and pairwise corresponds to “mapAdjacent” or and the ‘suffices’ generator expression corresponds to “tail”.
a little bit late:
my commented solution in OCaml (http://www.gleocadie.net/?p=543&lang=en) using OCaml Batteries