String-Replace
May 22, 2015
In the previous exercise I needed a string-replace
function, and was surprised not to find one in my code library. I quickly wrote a very simple function, noting that it was “awful” because it had quadratic performance.
Your task is to write a string-replace
function that has linear time instead of quadratic. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
C++ implemenation for Python.
[sourecode lang=”C++”]
/* len(self)>=1, len(from)==len(to)>=2, maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_substring_in_place(PyStringObject *self,
const char *from_s, Py_ssize_t from_len,
const char *to_s, Py_ssize_t to_len,
Py_ssize_t maxcount)
{
char *result_s, *start, *end;
char *self_s;
Py_ssize_t self_len, offset;
PyStringObject *result;
/* The result string will be the same size */
self_s = PyString_AS_STRING(self);
self_len = PyString_GET_SIZE(self);
offset = stringlib_find(self_s, self_len,
from_s, from_len,
0);
if (offset == -1) {
/* No matches; return the original string */
return return_self(self);
}
/* Need to make a new string */
result = (PyStringObject *) PyString_FromStringAndSize(NULL, self_len);
if (result == NULL)
return NULL;
result_s = PyString_AS_STRING(result);
Py_MEMCPY(result_s, self_s, self_len);
/* change everything in-place, starting with this one */
start = result_s + offset;
Py_MEMCPY(start, to_s, from_len);
start += from_len;
end = result_s + self_len;
while ( –maxcount > 0) {
offset = stringlib_find(start, end-start,
from_s, from_len,
0);
if (offset==-1)
break;
Py_MEMCPY(start+offset, to_s, from_len);
start += offset+from_len;
}
return result;
}
[/sourecode]
This looked so suitable for C that I couldn’t resist trying it. The result is at http://codepad.org/glHPVdoL.
On a first run, the times were about ten times longer than those above, not what I expected. It turned out that my library strstr function execution was quadratic in string length. Replacing strstr with local code produced a times about 20 times faster than those above, and memory 200 times less.
So even coding close to the machine, you can’t be sure of a simple library function.