Levenshtein Distance
September 12, 2014
In a previous exercise we computed the edit distance between two strings where the allowable operations were insert or delete characters; we made the calculation by determining the longest common subsequence. A different definition of edit distance allows substitutions as well as insertions and deletions, and is called the Levenshtein distance, since it was studied by Vladimir Levenshtein in 1965. For example, the edit distance between “hill” and “hilt” is 2 (delete the “l” and insert the “t”) but the Levenshtein distance is 1 (replace “l” by “t”).
The basic algorithm is recursive: if either string is empty, the Levenshtein distance is the length of the other string, otherwise it is the minimum of the Levenshtein distance computed by deleting the first character from the first string, by deleting the first character from the second string, or by deleting the first characters of each of the two strings (adding 1 if the two characters differ), plus the Levenshtein distance of the remaining substrings. That computation takes exponential time as it recomputes the same substring Levenshtein distances many times. A better algorithm uses dynamic programming to build up the substring distances, so they are always available as they are needed.
Your task is to write two functions to compute the Levenshtein distance between two strings. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Very cool. Here’s my version in Swift — memoize caches the results by paired substrings for reuse:
And here’s the memoize function:
Two versions in Python. A recursive version with memoization and an iterative version with 2 matrix rows.