String-Replace

May 22, 2015

Let’s first examine the claim that the function from the previous exercise takes quadratic time. Here’s the function:

(define (string-replace str pat rep) ; quadratic
  (let ((x (string-find pat str)))
    (if (not x) str
      (string-append (substring str 0 x) rep
        (string-replace (substring str (+ x (string-length pat))
          (string-length str)) pat rep)))))

It’s quadratic because at each recursive call when the string is extended the entire string has to be scanned and rewritten. Here we demonstrate the quadratic time complexity:

> (time (begin (string-replace (make-string 2000 #\a) "a" "b") 'done))
(time (begin (string-replace (...) ...) ...))
    2 collections
    16 ms elapsed cpu time, including 0 ms collecting
    22 ms elapsed real time, including 6 ms collecting
    17764912 bytes allocated, including 15202400 bytes reclaimed
done
> (time (begin (string-replace (make-string 4000 #\a) "a" "b") 'done))
(time (begin (string-replace (...) ...) ...))
    8 collections
    78 ms elapsed cpu time, including 47 ms collecting
    85 ms elapsed real time, including 31 ms collecting
    67530192 bytes allocated, including 84461056 bytes reclaimed
done
> (time (begin (string-replace (make-string 8000 #\a) "a" "b") 'done))
(time (begin (string-replace (...) ...) ...))
    27 collections
    405 ms elapsed cpu time, including 188 ms collecting
    421 ms elapsed real time, including 209 ms collecting
    263062416 bytes allocated, including 133705904 bytes reclaimed
done

When the length of the string doubles from 2000 to 4000 execution time increases by 78 / 16 = 4.9 times and when the length of the string doubles from 4000 to 8000 execution time increases by 405 / 78 = 5.2 times, which is clearly quadratic. And look at all of the garbage that is generated!

A better approach is to scan the string from left to right, looking for matches, collect the outgoing text in a list, and write the new string at the end:

(define (string-replace str pat rep) ; linear
  (let ((str-len (string-length str))
        (pat-len (string-length pat)))
    (let loop ((pos 0) (xs (list)))
      (if (<= str-len pos)
          (apply string-append (reverse xs))
          (let ((x (string-find pat str pos)))
            (if x
                (loop (+ pos pat-len)
                      (cons rep (cons (substring str pos x) xs)))
                (loop str-len (cons (substring str pos str-len) xs))))))))

And here are some timings:

> (time (begin (string-replace (make-string 25000 #\a) "a" "b") 'done))
(time (begin (string-replace (...) ...) ...))
    3 collections
    46 ms elapsed cpu time, including 0 ms collecting
    42 ms elapsed real time, including 6 ms collecting
    22201088 bytes allocated, including 24727008 bytes reclaimed
done
> (time (begin (string-replace (make-string 50000 #\a) "a" "b") 'done))
(time (begin (string-replace (...) ...) ...))
    5 collections
    78 ms elapsed cpu time, including 0 ms collecting
    83 ms elapsed real time, including 8 ms collecting
    45713280 bytes allocated, including 174417216 bytes reclaimed
done
> (time (begin (string-replace (make-string 100000 #\a) "a" "b") 'done))
(time (begin (string-replace (...) ...) ...))
    11 collections
    172 ms elapsed cpu time, including 46 ms collecting
    174 ms elapsed real time, including 22 ms collecting
    93394016 bytes allocated, including 90342368 bytes reclaimed
done

This is clearly linear. At the first doubling of the length of the string, execution time increased by 78 / 46 = 1.7 times, and at the second doubling execution time increased by 172 / 78 = 2.2 times. And the execution times are very much faster; when I tried the timing test on the quadratic version with a 40,000 character string, the machine crunched for several minutes, then died.

We used string-find from the Standard Prelude. You can run the program at http://ideone.com/tRPxfW.

Advertisement

Pages: 1 2

2 Responses to “String-Replace”

  1. Rutger said

    C++ implemenation for Python.

    [sourecode lang=”C++”]

    /* len(self)>=1, len(from)==len(to)>=2, maxcount>=1 */
    Py_LOCAL(PyStringObject *)
    replace_substring_in_place(PyStringObject *self,
    const char *from_s, Py_ssize_t from_len,
    const char *to_s, Py_ssize_t to_len,
    Py_ssize_t maxcount)
    {
    char *result_s, *start, *end;
    char *self_s;
    Py_ssize_t self_len, offset;
    PyStringObject *result;

    /* The result string will be the same size */

    self_s = PyString_AS_STRING(self);
    self_len = PyString_GET_SIZE(self);

    offset = stringlib_find(self_s, self_len,
    from_s, from_len,
    0);
    if (offset == -1) {
    /* No matches; return the original string */
    return return_self(self);
    }

    /* Need to make a new string */
    result = (PyStringObject *) PyString_FromStringAndSize(NULL, self_len);
    if (result == NULL)
    return NULL;
    result_s = PyString_AS_STRING(result);
    Py_MEMCPY(result_s, self_s, self_len);

    /* change everything in-place, starting with this one */
    start = result_s + offset;
    Py_MEMCPY(start, to_s, from_len);
    start += from_len;
    end = result_s + self_len;

    while ( –maxcount > 0) {
    offset = stringlib_find(start, end-start,
    from_s, from_len,
    0);
    if (offset==-1)
    break;
    Py_MEMCPY(start+offset, to_s, from_len);
    start += offset+from_len;
    }

    return result;
    }

    [/sourecode]

  2. This looked so suitable for C that I couldn’t resist trying it. The result is at http://codepad.org/glHPVdoL.
    On a first run, the times were about ten times longer than those above, not what I expected. It turned out that my library strstr function execution was quadratic in string length. Replacing strstr with local code produced a times about 20 times faster than those above, and memory 200 times less.
    So even coding close to the machine, you can’t be sure of a simple library function.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: