## Titlecase

### April 12, 2016

A string is titlecased when the first letter of each word is capitalized and the remaining letters are lower case. For instance, the string “programming PRAXIS” becomes “Programming Praxis” when titlecased.

Your task is to write a function that takes a string and returns it in titlecase. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

### 19 Responses to “Titlecase”

1. klettier said
```let input = "programming PRAXIS"
let validOutput = "Programming Praxis"

let titlecased  =
let rec loop chars start previousSpace acc =
match chars with
| [] -> acc
| ' ' :: tail -> ' ' :: acc |> loop tail false true
| a :: tail when previousSpace || start -> Char.ToUpper(a) :: acc |> loop tail false false
| a :: tail -> Char.ToLower(a) :: acc |> loop tail false false

List.ofSeq
>> fun x -> loop x true false []
>> List.rev
>> String.Concat

validOutput = titlecased input // TRUE
```
2. ```sub tc { return (lc \$_[0]) =~ s{\b([a-z])}{uc \$1}smegr; }
```
3. Re-read actually want split on space only…

```sub tc { return (lc \$_[0]) =~ s{(?<!\S)([a-z])}{uc \$1}smegr; }
```
4. Gabriel P Getzie said
```def titlecase(somestr):
lastchar = ' '
newstr = ''
for c in somestr:
if lastchar == ' ':
newstr += c.upper()
else:
newstr += c.lower()
lastchar = c
return newstr
```
5. mcmillhj said

Alternate Perl solution:

[sourcode lang=”perl”]
#!perl

use strict;
use warnings;

sub tc { return join ‘ ‘, map { ucfirst lc } split /\s+/, \$_[0]; }
[/sourcecode]

6. mcmillhj said

Oops, typo in the above:

#!perl

use strict;
use warnings;

sub tc { return join ‘ ‘, map { ucfirst lc } split /\s+/, \$_[0]; }

7. Globules said

A Haskell version. The resulting strings are printed in quotes to demonstrate that the original whitespace is maintained.

```import Data.Char
import Data.List.Split

-- Convert a string to title case, keeping any existing whitespace.
titleCase :: String -> String
titleCase = concatMap entitle . split (whenElt isSpace)
where entitle (c:cs) = toUpper c : map toLower cs
entitle ""     = ""

main :: IO ()
main = do
print \$ titleCase "programming PRAXIS"
print \$ titleCase "  docTOR jEkyll  And    mr. hYdE  "
```
```\$ ./titlecase
"Programming Praxis"
"  Doctor Jekyll  And    Mr. Hyde  "
```
8. matthew said

Here’s a simple FSA for the problem in C++, templated so we can do both traditional and wide strings:

```template<typename IT, typename PRED, typename TRANS>
void capitalize(IT start, IT end, PRED space, PRED alnum, TRANS upper, TRANS lower) {
for (int state = 0; start != end; start++) {
switch(state) {
case 0:
if (alnum(*start)) {
*start = upper(*start);
state = 1;
}
break;
case 1:
if (space(*start)) {
state = 0;
} else {
*start = lower(*start);
}
}
}
}

#include <locale.h>
#include <ctype.h>
#include <wctype.h>
#include <wchar.h>
#include <string.h>
#include <stdio.h>

int main() {
setlocale(LC_ALL,"");
char s[] = "'THINGS FALL APART (THE CENTRE CANNOT HOLD)'";
capitalize(s,s+strlen(s),isspace,isalnum,toupper,tolower);
printf("%s\n",s);
wchar_t t[] =L" Ὢ ΠῸΠΟΙ, ΟἾΟΝ ΔΉ ΝΥ ΘΕΟῪς ΒΡΟΤΟῚ ΑἸΤΙΌΩΝΤΑΙ";
capitalize(t,t+wcslen(t),iswspace,iswalnum,towupper,towlower);
printf("%S\n",t);
}
```
9. John Cowan said

All of these solutions are buggy, because they don’t deal properly with the full complexities of Unicode. I believe the sample implementation of SRFI 129 will work correctly assuming only correct `char-upcase` and `char-downcase` from the underlying Scheme; the additional maps required are provided by the `titlemaps.scm` file in the same repo. The code isn’t exactly efficient, but it works. Note that it uppercases any letter preceded by a non-letter, and space is not special (so “foo-bar” titlecases as “Foo-Bar”).

Here’s part of the rationale from the SRFI, which explains why R6RS gets this only partly right:

The Latin letters of the ASCII repertoire are divided into two groups, the uppercase letters A-Z and the lowercase letters a-z. In Unicode matters are more complicated. For historical reasons, some Unicode characters represent two consecutive letters, the first uppercase and the second lowercase. These are known as titlecase letters, because they can be used to capitalize words, as in book titles. They can also appear at the beginning of a sentence. In all cases, it is possible to avoid titlecase letters by using two Unicode characters to represent the sequence.

There are four Latin titlecase letters, each with an uppercase and a lowercase counterpart. For example, the titlecase letter ǲ has the uppercase counterpart Ǳ and the lowercase counterpart ǳ. These may be replaced by the usually identical-looking two-character sequences Dz, DZ, and dz respectively. Similarly, there are 27 Greek titlecase letters, each of which has Greek ι displayed either as a diacritic under the capital letter or immediately following it. For example, ᾈ is a titlecase letter with ᾀ as its lowercase counterpart. There is no single-character uppercase equivalent; one must use the two-character sequence ἈΙ instead.

[…]

As an example of why the R6RS definition of string-titlecase does not suffice, consider the string “ﬂoo powDER”, which begins with a ligature of the characters f and l. The Unicode way of titlecasing this string is to treat the ligature the same as the two-character sequence “fl”, in which case the result is “Floo Powder”. However, by the strict letter of R6RS, the “ﬂ” character must be passed to char-titlecase, which in this case will return its argument unchanged, and the result is “ﬂoo Powder”. What is more, if the ﬂ character is not even seen as a casing letter, then the result will be “ﬂOo Powder”. Schemes exist that exhibit all of these behaviors.

10. matthew said

@John: good point. How does that code deal with Dutch IJ (when not represented as a single codepoint) or the notorious Turkish ‘I’?

11. John Cowan said

It would have to be tailored for specific languages. You want “ijzeren” (meaning “iron”) titlecased as “IJzeren”, but you don’t want “ijtihad” (romanized Arabic for “diligence” or “independent judgment”, literally “struggle with oneself”) to become “IJtihad”. The R6RS/R7RS definitions of `char-upcase` and `char-downcase specifically exclude the Turkish/Azeri and Lithuanian special cases of casing.`

12. matthew said

@John: tricky stuff indeed. And just as tricky is the question of word boundaries – won’t your function titlecase “isn’t” to “Isn’T” though. Not sure what the best thing to do here, maybe just follow http://unicode.org/reports/tr29/#Word_Boundaries (‘The correct interpretation of hyphens in the context of word boundaries is challenging’).

13. John Cowan said

TR 29 is indeed the Right Thing, but SRFI 129 deliberately doesn’t specify it. It’s another of those language-sensitive issues: “doesn’t” is a single word and shouldn’t become “Does’Nt”, but “l’assommoir” (French slang meaning something like “the joint” or “the dive”, and the title of a novel by Zola) is underlyingly two words and should become “L’Assommoir”. Language-insensitive code can only do so much.

14. matthew said

@John – Sure – I was about to say we could assume an English milieu for this problem, but even then that doesn’t help – I can’t think of a case where there is more than 1 letter before the apostrophe that does get capitalized, but for just 1 letter we can have both forms (“Y’All”, “O’Clock”, “P’s and Q’s”, “o’er the lea” – though assuming capitalization for the second part (to cover O’Shaughnessy and L’Escargot) with an explicit list of exceptions might be adequate.

15. matthew said

There is the name “De’Ath” of course, though some spell it “De’ath” (e.g. https://en.wikipedia.org/wiki/Wilfred_De'ath, though Wikipedia seems a bit confused on the matter). There was a lecturer at university called De’Ath, not sure how he pronounced it, but he was pretty universally known as Doctor Death.

16. John Cowan said

In the novel Murder Must Advertise, Lord Peter Wimsey uses the pseudonym “Death Bredon”, which is actually his two middle names. When asked about his first name, he says: “It’s spelt Death. Pronounce it any way you like. Most of the people who are plagued with it make it rhyme with teeth, but personally I think it sounds more picturesque when rhymed with breath.”

17. matthew said

@John: Nice quote, thanks. I wonder if The Nine Tailors has anything to help with the current problem.

18. Paul Marfleet said

Solution in C#, using LINQ and Extension Methods:

```using System.Collections.Generic;
using System.Linq;

namespace ProgrammingPraxis.Core
{
{
public static int CalculateProductOfTwoLongestWordsThatDoNotShareAnyLetters(this IEnumerable<string> input)
{
return (from first in input
from second in input
where !ReferenceEquals(first, second) && HaveUniqueLetters(first, second)
select first.Length * second.Length).Max();
}

private static bool HaveUniqueLetters(string first, string second)
{
return first.All(charFirst => !second.Any(charSecond =>
char.ToLowerInvariant(charFirst) == char.ToLowerInvariant(charSecond)));
}
}
}
```
19. pmarfleet said

Solution in C#:

```namespace ProgrammingPraxis.Core
{
public static class TitleCaseExtensions
{
public static string ToTitleCase(this string input)
{
var output = input.ToCharArray();
char? previous = null;

for (var i = 0; i < output.Length; i++)
{
char current = output[i];

if (!previous.HasValue || char.IsWhiteSpace(previous.Value))
{
output[i] = char.ToUpper(current);
}
else if (char.IsUpper(current))
{
output[i] = char.ToLower(current);
}

previous = current;
}

return new string(output);
}
}
}
```