Spelling Numbers
July 3, 2020
First we write a program that converts a number to words; we did this in a previous exercise, but it’s fun to do it again:
(define (spell n)
(let ((ones '("" "one" "two" "three" "four" "five" "six"
"seven" "eight" "nine" "ten" "eleven"
"twelve" "thirteen" "fourteen" "fifteen"
"sixteen" "seventeen" "eighteen" "nineteen"))
(tens '("" "" "twenty" "thirty" "forty" "fifty"
"sixty" "seventy" "eighty" "ninety")))
(cond ((negative? n) (error 'spell "too small"))
((zero? n) "zero")
((< n 20) (list-ref ones n))
((< n 100) (string-append
(list-ref tens (quotient n 10))
(list-ref ones (modulo n 10))))
((= n 100) "onehundred")
(else (error 'spell "too big")))))
We ignore spaces because they don’t affect the sort. Then it’s just a matter of calling the sort:
> (sort (lambda (a b) (string<? (spell a) (spell b))) (range 101)) (8 18 80 88 85 84 89 81 87 86 83 82 11 15 50 58 55 54 59 51 57 56 53 52 5 40 48 45 44 49 41 47 46 43 42 4 14 9 19 90 98 95 94 99 91 97 96 93 92 1 100 7 17 70 78 75 74 79 71 77 76 73 72 6 16 60 68 65 64 69 61 67 66 63 62 10 13 30 38 35 34 39 31 37 36 33 32 3 12 20 28 25 24 29 21 27 26 23 22 2 0)
You can run the program at https://ideone.com/AHCNhf, or look at A036747.
As usual a Perl response… But this one has a Schwartzian transform – which makes the sort appreciably faster (although another approach can be memoization to cache the spellings). The transform adds an index to the sort {in this case the spelled out number}, sorts by the index and then removes it. It is a standard Perl technique to speed up these sorts of sorts.
use Number::Spell; say join ' ', map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { [ spell_number($_), $_ ] } 0 .. 100Here’s a different way to solve the problem, by constructing a Finite State Automaton:
""" Construct an FSA for recognizing number expressions. This can also be used to generate such expressions with a depth-first search that handles tokens for each state in lexical order. """ # Construct automaton ones = ["zero","one","two","three","four","five","six","seven","eight","nine", "ten","eleven","twelve","thirteen","fourteen", "fifteen","sixteen","seventeen","eighteen","nineteen"] tens = ["twenty","thirty","forty","fifty","sixty","seventy","eighty","ninety"] # State 0: beginning of string # State 1: eg. after "twenty", so can have "one", but not "eleven" # State 2: terminal m = {} # Python 3.7 dicts are kept in insertion order m[0] = dict(sorted([(s,(2,i)) for i,s in enumerate(ones)] + [(s,(1,10*(i+2))) for i,s in enumerate(tens)])) m[1] = dict(sorted([(s,(2,i+1)) for i,s in enumerate(ones[1:10])])) terminals = [1,2] # Terminal states # Recurse through FSA to generate possibly expressions def scan(state): if state in terminals: yield [] if state not in m: return for s in m[state]: for t in scan(m[state][s][0]): yield [s]+t # Use FSA to parse an expression (as a list of tokens). def parse(tokens): state,value = 0,0 while tokens: s = tokens[0] if state not in m: return None if s not in m[state]: return None state,value1 = m[state][s] value += value1 tokens = tokens[1:] return value if state in terminals else None for s in scan(0): print("%s: %s"%(parse(s)," ".join(s))) """ 8: eight 18: eighteen 80: eighty 88: eighty eight ... 22: twenty two 2: two 0: zero """I can’t resist showing off that the Common Lisp format function can output integers in English.
(defun spell-number (n) (format nil "~r" n)) (let* ((L (loop for i from 0 to 100 collect i)) (S (sort L #'string< :key #'spell-number))) (format t "~a~%" S))@Jan: Other languages have libraries to do such things, eg. num2words in Python:
but where is the fun in that?
I was needlessly verbose there:
I’m disappointed that num2words doesn’t seem to support Catalan.
In Python for numbers in dutch (the language was not specified in this exercise). The rules for dutch number spelling are a little more complicated than for english number spelling.
NUMS = list(range(100, 20, -10)) + list(range(20, -1, -1)) SPELL_NL = "honderd negentig tachtig zeventig zestig vijftig veertig dertig twintig negentien" \ " achttien zeventien zestien vijftien veertien dertien twaalf elf tien" \ " negen acht zeven zes vijf vier drie twee een nul".split() DSPELL_NL = dict(zip(NUMS, SPELL_NL)) def spell_number_nl(n): """only numbers [0, 100] Duth spelling of numbers has a few pecularities: 1 is één, but 21 is eenentwintig (not éénentwintig) numbers higher than 20 are written as eenentwintig (one and twenty) and tweeëntwintig (two and twenty). As the Dutch language does not like to see here a sequence of 2 e's, the second is written as ë """ if n == 1: return "één" result = [] for num in DSPELL_NL: if num <= n: result.append(DSPELL_NL[num]) n -= num if n == 0: break if len(result) > 1 and result[1].endswith("e"): return "ën".join(reversed(result)) return "en".join(reversed(result)) print(sorted(range(101), key=spell_number_nl)) # 8, 38, 98, 88, 28, 48, 58, 68, 78, 18, 13, 30, 3, 33, 93, 83, 23, 43, 53, # 63, 73, 1, 31, 91, 81, 21, 41, 51, 61, 71, 11, 100, 9, 39, 99, 89, 29, 49 # 59, 69, 79, 19, 90, 0, 80, 10, 12, 2, 32, 92, 82, 22, 42, 52, 62, 72, 20, 14 # 40, 4, 34, 94, 84, 24, 44, 54, 64, 74, 5, 35, 95, 85, 25, 45, 55, 65, 75, 15 # 50, 6, 36, 96, 86, 26, 46, 56, 66, 76, 16, 60, 7, 37, 97, 87, 27, 47, 57, 67 # 77, 17, 70@Paul: nice. How are you getting the right sorting order for the Dutch? Locale setting?
Or ICU works nicely:
>>> import icu >>> from num2words import num2words >>> def keyfun(locale): ... collator = icu.Collator.createInstance(icu.Locale(locale)) ... return lambda a: collator.getSortKey(a[0]) >>> [n for s,n in sorted(((num2words(n,lang='nl'),n) for n in range(101)), key=keyfun('nl_NL'))] [8, 38, 98, 88, 28, 48, 58, 68, 78, 18, 13, 30, 3, 33, 93, 83, 23, 43, 53, 63, 73, 1, 31, 91, 81, 21, 41, 51, 61, 71, 11, 100, 9, 39, 99, 89, 29, 49, 59, 69, 79, 19, 90, 0, 80, 10, 12, 2, 32, 92, 82, 22, 42, 52, 62, 72, 20, 14, 40, 4, 34, 94, 84, 24, 44, 54, 64, 74, 5, 35, 95, 85, 25, 45, 55, 65, 75, 15, 50, 6, 36, 96, 86, 26, 46, 56, 66, 76, 16, 60, 7, 37, 97, 87, 27, 47, 57, 67, 77, 17, 70]@matthew: I did not really think about the sorting order in the Dutch language. I should have done that, of course. The official Dutch alphabet ordering is like string.ascii_lower. The characters é and ë are supposed to be the same as e for ordering. That results in the order that you gave in your last post.
Here’s a solution in Python.
ones = ( 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen' ) tens = ('twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety') def english(n): if n == 0: output = 'zero' elif n <= 19: output = ones[n - 1] elif n <= 99: q, r = divmod(n, 10) output = tens[q - 2] if r > 0: output += f' {ones[r - 1]}' elif n == 100: output = 'one hundred' else: raise RuntimeError(f'Unsupported number: {n}') return output print(sorted(range(101), key=english))Output:
[ 8, 18, 80, 88, 85, 84, 89, 81, 87, 86, 83, 82, 11, 15, 50, 58, 55, 54, 59, 51, 57, 56, 53, 52, 5, 40, 48, 45, 44, 49, 41, 47, 46, 43, 42, 4, 14, 9, 19, 90, 98, 95, 94, 99, 91, 97, 96, 93, 92, 1, 100, 7, 17, 70, 78, 75, 74, 79, 71, 77, 76, 73, 72, 6, 16, 60, 68, 65, 64, 69, 61, 67, 66, 63, 62, 10, 13, 30, 38, 35, 34, 39, 31, 37, 36, 33, 32, 3, 12, 20, 28, 25, 24, 29, 21, 27, 26, 23, 22, 2, 0 ]