Titlecase
April 12, 2016
A string is titlecased when the first letter of each word is capitalized and the remaining letters are lower case. For instance, the string “programming PRAXIS” becomes “Programming Praxis” when titlecased.
Your task is to write a function that takes a string and returns it in titlecase. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Re-read actually want split on space only…
Alternate Perl solution:
[sourcode lang=”perl”]
#!perl
use strict;
use warnings;
sub tc { return join ‘ ‘, map { ucfirst lc } split /\s+/, $_[0]; }
[/sourcecode]
Oops, typo in the above:
#!perl
use strict;
use warnings;
sub tc { return join ‘ ‘, map { ucfirst lc } split /\s+/, $_[0]; }
A Haskell version. The resulting strings are printed in quotes to demonstrate that the original whitespace is maintained.
Here’s a simple FSA for the problem in C++, templated so we can do both traditional and wide strings:
All of these solutions are buggy, because they don’t deal properly with the full complexities of Unicode. I believe the sample implementation of SRFI 129 will work correctly assuming only correct
char-upcase
andchar-downcase
from the underlying Scheme; the additional maps required are provided by thetitlemaps.scm
file in the same repo. The code isn’t exactly efficient, but it works. Note that it uppercases any letter preceded by a non-letter, and space is not special (so “foo-bar” titlecases as “Foo-Bar”).Here’s part of the rationale from the SRFI, which explains why R6RS gets this only partly right:
@John: good point. How does that code deal with Dutch IJ (when not represented as a single codepoint) or the notorious Turkish ‘I’?
It would have to be tailored for specific languages. You want “ijzeren” (meaning “iron”) titlecased as “IJzeren”, but you don’t want “ijtihad” (romanized Arabic for “diligence” or “independent judgment”, literally “struggle with oneself”) to become “IJtihad”. The R6RS/R7RS definitions of
char-upcase
andchar-downcase specifically exclude the Turkish/Azeri and Lithuanian special cases of casing.
@John: tricky stuff indeed. And just as tricky is the question of word boundaries – won’t your function titlecase “isn’t” to “Isn’T” though. Not sure what the best thing to do here, maybe just follow http://unicode.org/reports/tr29/#Word_Boundaries (‘The correct interpretation of hyphens in the context of word boundaries is challenging’).
TR 29 is indeed the Right Thing, but SRFI 129 deliberately doesn’t specify it. It’s another of those language-sensitive issues: “doesn’t” is a single word and shouldn’t become “Does’Nt”, but “l’assommoir” (French slang meaning something like “the joint” or “the dive”, and the title of a novel by Zola) is underlyingly two words and should become “L’Assommoir”. Language-insensitive code can only do so much.
@John – Sure – I was about to say we could assume an English milieu for this problem, but even then that doesn’t help – I can’t think of a case where there is more than 1 letter before the apostrophe that does get capitalized, but for just 1 letter we can have both forms (“Y’All”, “O’Clock”, “P’s and Q’s”, “o’er the lea” – though assuming capitalization for the second part (to cover O’Shaughnessy and L’Escargot) with an explicit list of exceptions might be adequate.
There is the name “De’Ath” of course, though some spell it “De’ath” (e.g. https://en.wikipedia.org/wiki/Wilfred_De'ath, though Wikipedia seems a bit confused on the matter). There was a lecturer at university called De’Ath, not sure how he pronounced it, but he was pretty universally known as Doctor Death.
In the novel Murder Must Advertise, Lord Peter Wimsey uses the pseudonym “Death Bredon”, which is actually his two middle names. When asked about his first name, he says: “It’s spelt Death. Pronounce it any way you like. Most of the people who are plagued with it make it rhyme with teeth, but personally I think it sounds more picturesque when rhymed with breath.”
@John: Nice quote, thanks. I wonder if The Nine Tailors has anything to help with the current problem.
Solution in C#, using LINQ and Extension Methods:
Solution in C#: