Three String Exercises
February 12, 2016
Scheme strings are normally stored internally as a length and a list of vectors, but just accessing the internally-stored length (via the string-length
function) feels like cheating, so we write a function that converts the string to a list and cdr
s through the list to the end:
(define (len str) (let loop ((cs (string->list str)) (len 0)) (if (null? cs) len (loop (cdr cs) (+ len 1))))) > (len "hello") 5 > (len "") 0
To find the index of a given character in a string, we loop through the string from either 0 or the optional starting point, returning #f
if the character is not in the string:
(define (instr c str . args) (let ((start (if (pair? args) (car args) 0)) (len (string-length str))) (let loop ((k start)) (cond ((<= len k) #f) ((char=? (string-ref str k) c) k) (else (loop (+ k 1))))))) > (instr #\e "hello") 1 > (instr #\e "hello" 2) #f > (instr #\e "") #f > (instr #\e "" 2) #f
Scheme provides a substring
function, so we’ll build our own out of lists; unlike the Scheme built-in, our function doesn’t fail if either endpoint is out of range:
(define (substr str start end) (list->string (take (- end start) (drop start (string->list str))))) > (substr "hello" 1 3) "el" > (substr "hello" 1 10) "ello" > (substr "hello" 10 20) ""
We used take
and drop
from the Standard Prelude. You can run the program at http://ideone.com/udaKR2.
If people want an extra challenge, assume UTF-8 encoding and try to handle combining characters etc (see http://www.unicode.org/reports/tr29/)
To quote http://unicode.org/faq/char_combmark.html:
“Q: How are characters counted when measuring the length or position of a character in a string?
A: Computing the length or position of a “character” in a Unicode string can be a little complicated, as there are four different approaches to doing so, plus the potential confusion caused by combining characters. The correct choice of which counting method to use depends on what is being counted and what the count or position is used for.
Each of the four approaches is illustrated below with an example string [U+0061, U+0928, U+093F, U+4E9C, U+10083]. The example string consists of the Latin small letter a, followed by the Devanagari syllable “ni” (which is represented by the syllable “na” and the combining vowel character “i”), followed by a common Han ideograph, and finally a Linear B ideogram for an “equid” (horse):
aनि亜𐂃
…”
In Python’s itertools.compress, more or less. That was fun.