Three String Exercises

February 12, 2016

Scheme strings are normally stored internally as a length and a list of vectors, but just accessing the internally-stored length (via the `string-length` function) feels like cheating, so we write a function that converts the string to a list and `cdr`s through the list to the end:

```(define (len str)
(let loop ((cs (string->list str)) (len 0))
(if (null? cs) len (loop (cdr cs) (+ len 1)))))

> (len "hello")
5
> (len "")
0```

To find the index of a given character in a string, we loop through the string from either 0 or the optional starting point, returning `#f` if the character is not in the string:

```(define (instr c str . args)
(let ((start (if (pair? args) (car args) 0))
(len (string-length str)))
(let loop ((k start))
(cond ((<= len k) #f)
((char=? (string-ref str k) c) k)
(else (loop (+ k 1)))))))

> (instr #\e "hello")
1
> (instr #\e "hello" 2)
#f
> (instr #\e "")
#f
> (instr #\e "" 2)
#f```

Scheme provides a `substring` function, so we’ll build our own out of lists; unlike the Scheme built-in, our function doesn’t fail if either endpoint is out of range:

```(define (substr str start end)
(list->string
(take (- end start)
(drop start
(string->list str)))))

> (substr "hello" 1 3)
"el"
> (substr "hello" 1 10)
"ello"
> (substr "hello" 10 20)
""```

We used `take` and `drop` from the Standard Prelude. You can run the program at http://ideone.com/udaKR2.

Pages: 1 2

2 Responses to “Three String Exercises”

1. matthew said

If people want an extra challenge, assume UTF-8 encoding and try to handle combining characters etc (see http://www.unicode.org/reports/tr29/)

“Q: How are characters counted when measuring the length or position of a character in a string?

A: Computing the length or position of a “character” in a Unicode string can be a little complicated, as there are four different approaches to doing so, plus the potential confusion caused by combining characters. The correct choice of which counting method to use depends on what is being counted and what the count or position is used for.

Each of the four approaches is illustrated below with an example string [U+0061, U+0928, U+093F, U+4E9C, U+10083]. The example string consists of the Latin small letter a, followed by the Devanagari syllable “ni” (which is represented by the syllable “na” and the combining vowel character “i”), followed by a common Han ideograph, and finally a Linear B ideogram for an “equid” (horse):

aनि亜𐂃

…”

2. Jussi Piitulainen said

In Python’s itertools.compress, more or less. That was fun.

```from itertools import chain, compress, count, repeat
from operator import mul, eq, ge, lt

def length(s):
return next(compress(count(), chain(map(mul, map(len, s), repeat(0)), repeat(1))))

def index(s, c, k = 0):
return next(compress(count(), map(mul, map(eq, s, repeat(c)), map(ge, count(), repeat(k)))), None)

def substring(s, b, e):
return ''.join(compress(s, map(mul, map(ge, count(), repeat(b)), map(lt, count(), repeat(e)))))

print(length(""), length("x"), length("oo"), length("foobar"))
print(index("kaksi", "k"), index("kaksi", "k", 1), index("kaksi", "k", 2), index("kaksi", "k", 3))
print(substring("notting", -3, 3), substring("notting", 3, 3) or "|", substring("notting", 3, 10))
print(index("", "w") or ":", substring("", 0, 1) or "-)", sep = "")

# 0 1 2 6
# 0 2 2 None
# not | ting
# :-)
```