Titlecase
April 12, 2016
A string is titlecased when the first letter of each word is capitalized and the remaining letters are lower case. For instance, the string “programming PRAXIS” becomes “Programming Praxis” when titlecased.
Your task is to write a function that takes a string and returns it in titlecase. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
let input = "programming PRAXIS" let validOutput = "Programming Praxis" let titlecased = let rec loop chars start previousSpace acc = match chars with | [] -> acc | ' ' :: tail -> ' ' :: acc |> loop tail false true | a :: tail when previousSpace || start -> Char.ToUpper(a) :: acc |> loop tail false false | a :: tail -> Char.ToLower(a) :: acc |> loop tail false false List.ofSeq >> fun x -> loop x true false [] >> List.rev >> String.Concat validOutput = titlecased input // TRUEsub tc { return (lc $_[0]) =~ s{\b([a-z])}{uc $1}smegr; }Re-read actually want split on space only…
sub tc { return (lc $_[0]) =~ s{(?<!\S)([a-z])}{uc $1}smegr; }def titlecase(somestr): lastchar = ' ' newstr = '' for c in somestr: if lastchar == ' ': newstr += c.upper() else: newstr += c.lower() lastchar = c return newstrAlternate Perl solution:
[sourcode lang=”perl”]
#!perl
use strict;
use warnings;
sub tc { return join ‘ ‘, map { ucfirst lc } split /\s+/, $_[0]; }
[/sourcecode]
Oops, typo in the above:
#!perl
use strict;
use warnings;
sub tc { return join ‘ ‘, map { ucfirst lc } split /\s+/, $_[0]; }
A Haskell version. The resulting strings are printed in quotes to demonstrate that the original whitespace is maintained.
import Data.Char import Data.List.Split -- Convert a string to title case, keeping any existing whitespace. titleCase :: String -> String titleCase = concatMap entitle . split (whenElt isSpace) where entitle (c:cs) = toUpper c : map toLower cs entitle "" = "" main :: IO () main = do print $ titleCase "programming PRAXIS" print $ titleCase " docTOR jEkyll And mr. hYdE "Here’s a simple FSA for the problem in C++, templated so we can do both traditional and wide strings:
template<typename IT, typename PRED, typename TRANS> void capitalize(IT start, IT end, PRED space, PRED alnum, TRANS upper, TRANS lower) { for (int state = 0; start != end; start++) { switch(state) { case 0: if (alnum(*start)) { *start = upper(*start); state = 1; } break; case 1: if (space(*start)) { state = 0; } else { *start = lower(*start); } } } } #include <locale.h> #include <ctype.h> #include <wctype.h> #include <wchar.h> #include <string.h> #include <stdio.h> int main() { setlocale(LC_ALL,""); char s[] = "'THINGS FALL APART (THE CENTRE CANNOT HOLD)'"; capitalize(s,s+strlen(s),isspace,isalnum,toupper,tolower); printf("%s\n",s); wchar_t t[] =L" Ὢ ΠῸΠΟΙ, ΟἾΟΝ ΔΉ ΝΥ ΘΕΟῪς ΒΡΟΤΟῚ ΑἸΤΙΌΩΝΤΑΙ"; capitalize(t,t+wcslen(t),iswspace,iswalnum,towupper,towlower); printf("%S\n",t); }All of these solutions are buggy, because they don’t deal properly with the full complexities of Unicode. I believe the sample implementation of SRFI 129 will work correctly assuming only correct
char-upcaseandchar-downcasefrom the underlying Scheme; the additional maps required are provided by thetitlemaps.scmfile in the same repo. The code isn’t exactly efficient, but it works. Note that it uppercases any letter preceded by a non-letter, and space is not special (so “foo-bar” titlecases as “Foo-Bar”).Here’s part of the rationale from the SRFI, which explains why R6RS gets this only partly right:
@John: good point. How does that code deal with Dutch IJ (when not represented as a single codepoint) or the notorious Turkish ‘I’?
It would have to be tailored for specific languages. You want “ijzeren” (meaning “iron”) titlecased as “IJzeren”, but you don’t want “ijtihad” (romanized Arabic for “diligence” or “independent judgment”, literally “struggle with oneself”) to become “IJtihad”. The R6RS/R7RS definitions of
char-upcaseandchar-downcase specifically exclude the Turkish/Azeri and Lithuanian special cases of casing.@John: tricky stuff indeed. And just as tricky is the question of word boundaries – won’t your function titlecase “isn’t” to “Isn’T” though. Not sure what the best thing to do here, maybe just follow http://unicode.org/reports/tr29/#Word_Boundaries (‘The correct interpretation of hyphens in the context of word boundaries is challenging’).
TR 29 is indeed the Right Thing, but SRFI 129 deliberately doesn’t specify it. It’s another of those language-sensitive issues: “doesn’t” is a single word and shouldn’t become “Does’Nt”, but “l’assommoir” (French slang meaning something like “the joint” or “the dive”, and the title of a novel by Zola) is underlyingly two words and should become “L’Assommoir”. Language-insensitive code can only do so much.
@John – Sure – I was about to say we could assume an English milieu for this problem, but even then that doesn’t help – I can’t think of a case where there is more than 1 letter before the apostrophe that does get capitalized, but for just 1 letter we can have both forms (“Y’All”, “O’Clock”, “P’s and Q’s”, “o’er the lea” – though assuming capitalization for the second part (to cover O’Shaughnessy and L’Escargot) with an explicit list of exceptions might be adequate.
There is the name “De’Ath” of course, though some spell it “De’ath” (e.g. https://en.wikipedia.org/wiki/Wilfred_De'ath, though Wikipedia seems a bit confused on the matter). There was a lecturer at university called De’Ath, not sure how he pronounced it, but he was pretty universally known as Doctor Death.
In the novel Murder Must Advertise, Lord Peter Wimsey uses the pseudonym “Death Bredon”, which is actually his two middle names. When asked about his first name, he says: “It’s spelt Death. Pronounce it any way you like. Most of the people who are plagued with it make it rhyme with teeth, but personally I think it sounds more picturesque when rhymed with breath.”
@John: Nice quote, thanks. I wonder if The Nine Tailors has anything to help with the current problem.
Solution in C#, using LINQ and Extension Methods:
using System.Collections.Generic; using System.Linq; namespace ProgrammingPraxis.Core { public static class GoogleInterviewQuestionExtensions { public static int CalculateProductOfTwoLongestWordsThatDoNotShareAnyLetters(this IEnumerable<string> input) { return (from first in input from second in input where !ReferenceEquals(first, second) && HaveUniqueLetters(first, second) select first.Length * second.Length).Max(); } private static bool HaveUniqueLetters(string first, string second) { return first.All(charFirst => !second.Any(charSecond => char.ToLowerInvariant(charFirst) == char.ToLowerInvariant(charSecond))); } } }Solution in C#:
namespace ProgrammingPraxis.Core { public static class TitleCaseExtensions { public static string ToTitleCase(this string input) { var output = input.ToCharArray(); char? previous = null; for (var i = 0; i < output.Length; i++) { char current = output[i]; if (!previous.HasValue || char.IsWhiteSpace(previous.Value)) { output[i] = char.ToUpper(current); } else if (char.IsUpper(current)) { output[i] = char.ToLower(current); } previous = current; } return new string(output); } } }