Three String Exercises
February 12, 2016
We have three simple exercises on strings today:
- Write a function that determines the length of a string.
- Write a function that finds the index of the first occurrence of a character in a string, optionally starting at a given index.
- Write a function that creates a new string as a substring of an input string, from a given starting index to a given ending index.
Your task is to write the three exercises described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
If people want an extra challenge, assume UTF-8 encoding and try to handle combining characters etc (see http://www.unicode.org/reports/tr29/)
To quote http://unicode.org/faq/char_combmark.html:
“Q: How are characters counted when measuring the length or position of a character in a string?
A: Computing the length or position of a “character” in a Unicode string can be a little complicated, as there are four different approaches to doing so, plus the potential confusion caused by combining characters. The correct choice of which counting method to use depends on what is being counted and what the count or position is used for.
Each of the four approaches is illustrated below with an example string [U+0061, U+0928, U+093F, U+4E9C, U+10083]. The example string consists of the Latin small letter a, followed by the Devanagari syllable “ni” (which is represented by the syllable “na” and the combining vowel character “i”), followed by a common Han ideograph, and finally a Linear B ideogram for an “equid” (horse):
aनि亜𐂃
…”
In Python’s itertools.compress, more or less. That was fun.