String-Replace

May 22, 2015

Let’s first examine the claim that the function from the previous exercise takes quadratic time. Here’s the function:

(define (string-replace str pat rep) ; quadratic (let ((x (string-find pat str))) (if (not x) str (string-append (substring str 0 x) rep (string-replace (substring str (+ x (string-length pat)) (string-length str)) pat rep)))))

It’s quadratic because at each recursive call when the string is extended the entire string has to be scanned and rewritten. Here we demonstrate the quadratic time complexity:

> (time (begin (string-replace (make-string 2000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...)) 2 collections 16 ms elapsed cpu time, including 0 ms collecting 22 ms elapsed real time, including 6 ms collecting 17764912 bytes allocated, including 15202400 bytes reclaimed done > (time (begin (string-replace (make-string 4000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...)) 8 collections 78 ms elapsed cpu time, including 47 ms collecting 85 ms elapsed real time, including 31 ms collecting 67530192 bytes allocated, including 84461056 bytes reclaimed done > (time (begin (string-replace (make-string 8000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...)) 27 collections 405 ms elapsed cpu time, including 188 ms collecting 421 ms elapsed real time, including 209 ms collecting 263062416 bytes allocated, including 133705904 bytes reclaimed done

When the length of the string doubles from 2000 to 4000 execution time increases by 78 / 16 = 4.9 times and when the length of the string doubles from 4000 to 8000 execution time increases by 405 / 78 = 5.2 times, which is clearly quadratic. And look at all of the garbage that is generated!

A better approach is to scan the string from left to right, looking for matches, collect the outgoing text in a list, and write the new string at the end:

(define (string-replace str pat rep) ; linear (let ((str-len (string-length str)) (pat-len (string-length pat))) (let loop ((pos 0) (xs (list))) (if (<= str-len pos) (apply string-append (reverse xs)) (let ((x (string-find pat str pos))) (if x (loop (+ pos pat-len) (cons rep (cons (substring str pos x) xs))) (loop str-len (cons (substring str pos str-len) xs))))))))
And here are some timings:

> (time (begin (string-replace (make-string 25000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...)) 3 collections 46 ms elapsed cpu time, including 0 ms collecting 42 ms elapsed real time, including 6 ms collecting 22201088 bytes allocated, including 24727008 bytes reclaimed done > (time (begin (string-replace (make-string 50000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...)) 5 collections 78 ms elapsed cpu time, including 0 ms collecting 83 ms elapsed real time, including 8 ms collecting 45713280 bytes allocated, including 174417216 bytes reclaimed done > (time (begin (string-replace (make-string 100000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...)) 11 collections 172 ms elapsed cpu time, including 46 ms collecting 174 ms elapsed real time, including 22 ms collecting 93394016 bytes allocated, including 90342368 bytes reclaimed done

This is clearly linear. At the first doubling of the length of the string, execution time increased by 78 / 46 = 1.7 times, and at the second doubling execution time increased by 172 / 78 = 2.2 times. And the execution times are very much faster; when I tried the timing test on the quadratic version with a 40,000 character string, the machine crunched for several minutes, then died.

We used string-find from the Standard Prelude. You can run the program at http://ideone.com/tRPxfW.

Posted by programmingpraxis

Filed in Exercises

2 Comments »

2 Responses to “String-Replace”

Rutger said
May 22, 2015 at 12:38 PM
C++ implemenation for Python.

[sourecode lang=”C++”]

/* len(self)>=1, len(from)==len(to)>=2, maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_substring_in_place(PyStringObject *self,
const char *from_s, Py_ssize_t from_len,
const char *to_s, Py_ssize_t to_len,
Py_ssize_t maxcount)
{
char *result_s, *start, *end;
char *self_s;
Py_ssize_t self_len, offset;
PyStringObject *result;

/* The result string will be the same size */

self_s = PyString_AS_STRING(self);
self_len = PyString_GET_SIZE(self);

offset = stringlib_find(self_s, self_len,
from_s, from_len,
0);
if (offset == -1) {
/* No matches; return the original string */
return return_self(self);
}

/* Need to make a new string */
result = (PyStringObject *) PyString_FromStringAndSize(NULL, self_len);
if (result == NULL)
return NULL;
result_s = PyString_AS_STRING(result);
Py_MEMCPY(result_s, self_s, self_len);

/* change everything in-place, starting with this one */
start = result_s + offset;
Py_MEMCPY(start, to_s, from_len);
start += from_len;
end = result_s + self_len;

while ( –maxcount > 0) {
offset = stringlib_find(start, end-start,
from_s, from_len,
0);
if (offset==-1)
break;
Py_MEMCPY(start+offset, to_s, from_len);
start += offset+from_len;
}

return result;
}

[/sourecode]
steve joseph said
June 7, 2015 at 11:01 AM
This looked so suitable for C that I couldn’t resist trying it. The result is at http://codepad.org/glHPVdoL.
On a first run, the times were about ten times longer than those above, not what I expected. It turned out that my library strstr function execution was quadratic in string length. Replacing strstr with local code produced a times about 20 times faster than those above, and memory 200 times less.
So even coding close to the machine, you can’t be sure of a simple library function.

Programming Praxis

String-Replace

May 22, 2015

2 Responses to “String-Replace”

Leave a comment

Categories

Archives

Archives

Programming Praxis

String-Replace

May 22, 2015

Share this:

Related

2 Responses to “String-Replace”

Leave a comment

Categories

Archives

Archives