Three String Exercises

February 12, 2016

Scheme strings are normally stored internally as a length and a list of vectors, but just accessing the internally-stored length (via the string-length function) feels like cheating, so we write a function that converts the string to a list and cdrs through the list to the end:

(define (len str)
  (let loop ((cs (string->list str)) (len 0))
    (if (null? cs) len (loop (cdr cs) (+ len 1)))))

> (len "hello")
5
> (len "")
0

To find the index of a given character in a string, we loop through the string from either 0 or the optional starting point, returning #f if the character is not in the string:

(define (instr c str . args)
  (let ((start (if (pair? args) (car args) 0))
        (len (string-length str)))
    (let loop ((k start))
      (cond ((<= len k) #f)
            ((char=? (string-ref str k) c) k)
            (else (loop (+ k 1)))))))

> (instr #\e "hello")
1
> (instr #\e "hello" 2)
#f
> (instr #\e "")
#f
> (instr #\e "" 2)
#f

Scheme provides a substring function, so we’ll build our own out of lists; unlike the Scheme built-in, our function doesn’t fail if either endpoint is out of range:

(define (substr str start end)
  (list->string
    (take (- end start)
      (drop start
        (string->list str)))))

> (substr "hello" 1 3)
"el"
> (substr "hello" 1 10)
"ello"
> (substr "hello" 10 20)
""

We used take and drop from the Standard Prelude. You can run the program at http://ideone.com/udaKR2.

Pages: 1 2

2 Responses to “Three String Exercises”

  1. matthew said

    If people want an extra challenge, assume UTF-8 encoding and try to handle combining characters etc (see http://www.unicode.org/reports/tr29/)

    To quote http://unicode.org/faq/char_combmark.html:

    “Q: How are characters counted when measuring the length or position of a character in a string?

    A: Computing the length or position of a “character” in a Unicode string can be a little complicated, as there are four different approaches to doing so, plus the potential confusion caused by combining characters. The correct choice of which counting method to use depends on what is being counted and what the count or position is used for.

    Each of the four approaches is illustrated below with an example string [U+0061, U+0928, U+093F, U+4E9C, U+10083]. The example string consists of the Latin small letter a, followed by the Devanagari syllable “ni” (which is represented by the syllable “na” and the combining vowel character “i”), followed by a common Han ideograph, and finally a Linear B ideogram for an “equid” (horse):

    aनि亜𐂃

    …”

  2. Jussi Piitulainen said

    In Python’s itertools.compress, more or less. That was fun.

    from itertools import chain, compress, count, repeat
    from operator import mul, eq, ge, lt
    
    def length(s):
        return next(compress(count(), chain(map(mul, map(len, s), repeat(0)), repeat(1))))
    
    def index(s, c, k = 0):
        return next(compress(count(), map(mul, map(eq, s, repeat(c)), map(ge, count(), repeat(k)))), None)
    
    def substring(s, b, e):
        return ''.join(compress(s, map(mul, map(ge, count(), repeat(b)), map(lt, count(), repeat(e)))))
    
    print(length(""), length("x"), length("oo"), length("foobar"))
    print(index("kaksi", "k"), index("kaksi", "k", 1), index("kaksi", "k", 2), index("kaksi", "k", 3))
    print(substring("notting", -3, 3), substring("notting", 3, 3) or "|", substring("notting", 3, 10))
    print(index("", "w") or ":", substring("", 0, 1) or "-)", sep = "")
    
    # 0 1 2 6
    # 0 2 2 None
    # not | ting
    # :-)
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: