Spelling Numbers

July 3, 2020

First we write a program that converts a number to words; we did this in a previous exercise, but it’s fun to do it again:

(define (spell n)
  (let ((ones '("" "one" "two" "three" "four" "five" "six"
                "seven" "eight" "nine" "ten" "eleven"
                "twelve" "thirteen" "fourteen" "fifteen"
                "sixteen" "seventeen" "eighteen" "nineteen"))
        (tens '("" "" "twenty" "thirty" "forty" "fifty"
                "sixty" "seventy" "eighty" "ninety")))
    (cond ((negative? n) (error 'spell "too small"))
          ((zero? n) "zero")
          ((< n 20) (list-ref ones n))
          ((< n 100) (string-append
                       (list-ref tens (quotient n 10))
                       (list-ref ones (modulo n 10))))
          ((= n 100) "onehundred")
          (else (error 'spell "too big")))))

We ignore spaces because they don’t affect the sort. Then it’s just a matter of calling the sort:

> (sort (lambda (a b) (string<? (spell a) (spell b))) (range 101))
(8 18 80 88 85 84 89 81 87 86 83 82 11 15 50 58 55 54 59 51
 57 56 53 52 5 40 48 45 44 49 41 47 46 43 42 4 14 9 19 90 98
 95 94 99 91 97 96 93 92 1 100 7 17 70 78 75 74 79 71 77 76
 73 72 6 16 60 68 65 64 69 61 67 66 63 62 10 13 30 38 35 34
 39 31 37 36 33 32 3 12 20 28 25 24 29 21 27 26 23 22 2 0)

You can run the program at https://ideone.com/AHCNhf, or look at A036747.

Posted by programmingpraxis

Filed in Exercises

10 Comments »

10 Responses to “Spelling Numbers”

James Curtis-Smith said
July 3, 2020 at 9:24 AM
As usual a Perl response… But this one has a Schwartzian transform – which makes the sort appreciably faster (although another approach can be memoization to cache the spellings). The transform adds an index to the sort {in this case the spelled out number}, sorts by the index and then removes it. It is a standard Perl technique to speed up these sorts of sorts.
```
use Number::Spell;
say join ' ',
    map  { $_->[1] }
    sort { $a->[0] cmp $b->[0] }
    map  { [ spell_number($_), $_ ] }
    0 .. 100
```

matthew said

July 3, 2020 at 1:27 PM

Here’s a different way to solve the problem, by constructing a Finite State Automaton:

"""
Construct an FSA for recognizing number expressions. This can also be used
to generate such expressions with a depth-first search that handles tokens
for each state in lexical order.
"""

# Construct automaton

ones = ["zero","one","two","three","four","five","six","seven","eight","nine",
        "ten","eleven","twelve","thirteen","fourteen",
        "fifteen","sixteen","seventeen","eighteen","nineteen"]
tens = ["twenty","thirty","forty","fifty","sixty","seventy","eighty","ninety"]

# State 0: beginning of string
# State 1: eg. after "twenty", so can have "one", but not "eleven"
# State 2: terminal

m = {}
# Python 3.7 dicts are kept in insertion order
m[0] = dict(sorted([(s,(2,i)) for i,s in enumerate(ones)] +
                   [(s,(1,10*(i+2))) for i,s in enumerate(tens)]))
m[1] = dict(sorted([(s,(2,i+1)) for i,s in enumerate(ones[1:10])]))
terminals = [1,2] # Terminal states

# Recurse through FSA to generate possibly expressions
def scan(state):
    if state in terminals: yield []
    if state not in m: return
    for s in m[state]:
        for t in scan(m[state][s][0]):
            yield [s]+t

# Use FSA to parse an expression (as a list of tokens).
def parse(tokens):
    state,value = 0,0
    while tokens:
        s = tokens[0]
        if state not in m: return None
        if s not in m[state]: return None
        state,value1 = m[state][s]
        value += value1
        tokens = tokens[1:]
    return value if state in terminals else None

for s in scan(0): print("%s: %s"%(parse(s)," ".join(s)))
"""
8: eight
18: eighteen
80: eighty
88: eighty eight
...
22: twenty two
2: two
0: zero
"""

Jan Van lent said

July 3, 2020 at 2:44 PM

I can’t resist showing off that the Common Lisp format function can output integers in English.

(defun spell-number (n)
  (format nil "~r" n))

(let* ((L (loop for i from 0 to 100 collect i))
       (S (sort L #'string< :key #'spell-number)))
  (format t "~a~%" S))

matthew said
July 3, 2020 at 11:02 PM
@Jan: Other languages have libraries to do such things, eg. num2words in Python:
```
>>> [n for _,n in list(sorted([(num2words(n),n) for n in range(100)]))]
[8, 18, 80, 88, ..., 22, 2, 0]
```
but where is the fun in that?
matthew said
July 3, 2020 at 11:16 PM
I was needlessly verbose there:
```
>>> from num2words import num2words
>>> [n for _,n in sorted((num2words(n),n) for n in range(100))]
[8, 18, 80, 88, ..., 22, 2, 0]
```
I’m disappointed that num2words doesn’t seem to support Catalan.

Paul said

July 4, 2020 at 8:24 AM

In Python for numbers in dutch (the language was not specified in this exercise). The rules for dutch number spelling are a little more complicated than for english number spelling.

NUMS = list(range(100, 20, -10)) + list(range(20, -1, -1))
SPELL_NL = "honderd negentig tachtig zeventig zestig vijftig veertig dertig twintig negentien" \
        " achttien zeventien zestien vijftien veertien dertien twaalf elf tien" \
        " negen acht zeven zes vijf vier drie twee een nul".split()
DSPELL_NL = dict(zip(NUMS, SPELL_NL))

def spell_number_nl(n):
    """only numbers [0, 100]
       Duth spelling of numbers has a few pecularities:
       1 is één, but 21 is eenentwintig (not éénentwintig)
       numbers higher than 20 are written as eenentwintig (one and twenty) and tweeëntwintig (two and twenty). As
       the Dutch language does not like to see here a sequence of 2 e's, the second is written as ë
    """
    if n == 1:
        return "één"
    result = []
    for num in DSPELL_NL:
        if num <= n:
            result.append(DSPELL_NL[num])
            n -= num
            if n == 0:
                break
    if len(result) > 1 and result[1].endswith("e"):
        return "ën".join(reversed(result))
    return "en".join(reversed(result))

print(sorted(range(101), key=spell_number_nl))
#  8, 38, 98, 88, 28, 48, 58, 68, 78, 18, 13, 30, 3, 33, 93, 83, 23, 43, 53,
#  63, 73, 1, 31, 91, 81, 21, 41, 51, 61, 71, 11, 100, 9, 39, 99, 89, 29, 49
#  59, 69, 79, 19, 90, 0, 80, 10, 12, 2, 32, 92, 82, 22, 42, 52, 62, 72, 20, 14
#  40, 4, 34, 94, 84, 24, 44, 54, 64, 74, 5, 35, 95, 85, 25, 45, 55, 65, 75, 15
#  50, 6, 36, 96, 86, 26, 46, 56, 66, 76, 16, 60, 7, 37, 97, 87, 27, 47, 57, 67
#  77, 17, 70

matthew said
July 4, 2020 at 8:57 AM
@Paul: nice. How are you getting the right sorting order for the Dutch? Locale setting?

matthew said

July 4, 2020 at 9:33 AM

Or ICU works nicely:

>>> import icu
>>> from num2words import num2words
>>> def keyfun(locale):
...   collator = icu.Collator.createInstance(icu.Locale(locale))
...   return lambda a: collator.getSortKey(a[0])
>>> [n for s,n in sorted(((num2words(n,lang='nl'),n) for n in range(101)), key=keyfun('nl_NL'))]
[8, 38, 98, 88, 28, 48, 58, 68, 78, 18, 13, 30, 3, 33, 93, 83, 23, 43, 53, 
 63, 73, 1, 31, 91, 81, 21, 41, 51, 61, 71, 11, 100, 9, 39, 99, 89, 29, 49, 
 59, 69, 79, 19, 90, 0, 80, 10, 12, 2, 32, 92, 82, 22, 42, 52, 62, 72, 20, 14,
 40, 4, 34, 94, 84, 24, 44, 54, 64, 74, 5, 35, 95, 85, 25, 45, 55, 65, 75, 15,
 50, 6, 36, 96, 86, 26, 46, 56, 66, 76, 16, 60, 7, 37, 97, 87, 27, 47, 57, 67,
 77, 17, 70]

Paul said
July 4, 2020 at 12:13 PM
@matthew: I did not really think about the sorting order in the Dutch language. I should have done that, of course. The official Dutch alphabet ordering is like string.ascii_lower. The characters é and ë are supposed to be the same as e for ordering. That results in the order that you gave in your last post.

Daniel said

July 15, 2020 at 8:31 PM

Here’s a solution in Python.

ones = (
    'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine',
    'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen',
    'seventeen', 'eighteen', 'nineteen'
)
tens = ('twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety')

def english(n):
    if n == 0:
        output = 'zero'
    elif n <= 19:
        output = ones[n - 1]
    elif n <= 99:
        q, r = divmod(n, 10)
        output = tens[q - 2]
        if r > 0: output += f' {ones[r - 1]}'
    elif n == 100:
        output = 'one hundred'
    else:
        raise RuntimeError(f'Unsupported number: {n}')
    return output

print(sorted(range(101), key=english))

Output:

[
    8, 18, 80, 88, 85, 84, 89, 81, 87, 86,
    83, 82, 11, 15, 50, 58, 55, 54, 59, 51,
    57, 56, 53, 52, 5, 40, 48, 45, 44, 49,
    41, 47, 46, 43, 42, 4, 14, 9, 19, 90,
    98, 95, 94, 99, 91, 97, 96, 93, 92, 1,
    100, 7, 17, 70, 78, 75, 74, 79, 71, 77,
    76, 73, 72, 6, 16, 60, 68, 65, 64, 69,
    61, 67, 66, 63, 62, 10, 13, 30, 38, 35,
    34, 39, 31, 37, 36, 33, 32, 3, 12, 20,
    28, 25, 24, 29, 21, 27, 26, 23, 22, 2, 0
]

Programming Praxis