## String-Replace

### May 22, 2015

Let’s first examine the claim that the function from the previous exercise takes quadratic time. Here’s the function:

```(define (string-replace str pat rep) ; quadratic   (let ((x (string-find pat str)))     (if (not x) str       (string-append (substring str 0 x) rep         (string-replace (substring str (+ x (string-length pat))           (string-length str)) pat rep)))))```

It’s quadratic because at each recursive call when the string is extended the entire string has to be scanned and rewritten. Here we demonstrate the quadratic time complexity:

```> (time (begin (string-replace (make-string 2000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...))     2 collections     16 ms elapsed cpu time, including 0 ms collecting     22 ms elapsed real time, including 6 ms collecting     17764912 bytes allocated, including 15202400 bytes reclaimed done > (time (begin (string-replace (make-string 4000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...))     8 collections     78 ms elapsed cpu time, including 47 ms collecting     85 ms elapsed real time, including 31 ms collecting     67530192 bytes allocated, including 84461056 bytes reclaimed done > (time (begin (string-replace (make-string 8000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...))     27 collections     405 ms elapsed cpu time, including 188 ms collecting     421 ms elapsed real time, including 209 ms collecting     263062416 bytes allocated, including 133705904 bytes reclaimed done```

When the length of the string doubles from 2000 to 4000 execution time increases by 78 / 16 = 4.9 times and when the length of the string doubles from 4000 to 8000 execution time increases by 405 / 78 = 5.2 times, which is clearly quadratic. And look at all of the garbage that is generated!

A better approach is to scan the string from left to right, looking for matches, collect the outgoing text in a list, and write the new string at the end:

```(define (string-replace str pat rep) ; linear   (let ((str-len (string-length str))         (pat-len (string-length pat)))     (let loop ((pos 0) (xs (list)))       (if (<= str-len pos)           (apply string-append (reverse xs))           (let ((x (string-find pat str pos)))             (if x                 (loop (+ pos pat-len)                       (cons rep (cons (substring str pos x) xs)))                 (loop str-len (cons (substring str pos str-len) xs)))))))) ```
And here are some timings:

```> (time (begin (string-replace (make-string 25000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...))     3 collections     46 ms elapsed cpu time, including 0 ms collecting     42 ms elapsed real time, including 6 ms collecting     22201088 bytes allocated, including 24727008 bytes reclaimed done > (time (begin (string-replace (make-string 50000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...))     5 collections     78 ms elapsed cpu time, including 0 ms collecting     83 ms elapsed real time, including 8 ms collecting     45713280 bytes allocated, including 174417216 bytes reclaimed done > (time (begin (string-replace (make-string 100000 #\a) "a" "b") 'done)) (time (begin (string-replace (...) ...) ...))     11 collections     172 ms elapsed cpu time, including 46 ms collecting     174 ms elapsed real time, including 22 ms collecting     93394016 bytes allocated, including 90342368 bytes reclaimed done```

This is clearly linear. At the first doubling of the length of the string, execution time increased by 78 / 46 = 1.7 times, and at the second doubling execution time increased by 172 / 78 = 2.2 times. And the execution times are very much faster; when I tried the timing test on the quadratic version with a 40,000 character string, the machine crunched for several minutes, then died.

We used `string-find` from the Standard Prelude. You can run the program at http://ideone.com/tRPxfW.

Pages: 1 2

### 2 Responses to “String-Replace”

1. Rutger said

C++ implemenation for Python.

[sourecode lang=”C++”]

/* len(self)>=1, len(from)==len(to)>=2, maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_substring_in_place(PyStringObject *self,
const char *from_s, Py_ssize_t from_len,
const char *to_s, Py_ssize_t to_len,
Py_ssize_t maxcount)
{
char *result_s, *start, *end;
char *self_s;
Py_ssize_t self_len, offset;
PyStringObject *result;

/* The result string will be the same size */

self_s = PyString_AS_STRING(self);
self_len = PyString_GET_SIZE(self);

offset = stringlib_find(self_s, self_len,
from_s, from_len,
0);
if (offset == -1) {
/* No matches; return the original string */
return return_self(self);
}

/* Need to make a new string */
result = (PyStringObject *) PyString_FromStringAndSize(NULL, self_len);
if (result == NULL)
return NULL;
result_s = PyString_AS_STRING(result);
Py_MEMCPY(result_s, self_s, self_len);

/* change everything in-place, starting with this one */
start = result_s + offset;
Py_MEMCPY(start, to_s, from_len);
start += from_len;
end = result_s + self_len;

while ( –maxcount > 0) {
offset = stringlib_find(start, end-start,
from_s, from_len,
0);
if (offset==-1)
break;
Py_MEMCPY(start+offset, to_s, from_len);
start += offset+from_len;
}

return result;
}

[/sourecode]

2. This looked so suitable for C that I couldn’t resist trying it. The result is at http://codepad.org/glHPVdoL.
On a first run, the times were about ten times longer than those above, not what I expected. It turned out that my library strstr function execution was quadratic in string length. Replacing strstr with local code produced a times about 20 times faster than those above, and memory 200 times less.
So even coding close to the machine, you can’t be sure of a simple library function.