Ordered Vowels
January 31, 2017
Today’s exercise is to write a program that finds all the words in a dictionary in which all the vowels in the word appear in ascending order. It is not necessary that all five vowels appear in the word, and vowels may be doubled. For instance, afoot
passes because the three vowels, a
, o
and o
appear in non-decreasing order.
An easy way to solve this problem uses regular expressions:
$ grep '^[^aeiou]*a*[^aeiou]*e*[^aeiou]*i*[^aeiou]*o*[^aeiou]*u*[^aeiou]*$' /etc/dict/words
Since that is so easy, you must write a program that does not use regular expressions.
Your task is to write a program that finds words with ordered vowels. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Line Breaks
January 27, 2017
All text processors require code to split the words of a paragraph into lines no greater than a given width, a process known as line breaking. There are a variety of algorithms to perform that process, ranging from simple to complex, and they produce a variety of output of various degrees of “estheticness.” Most algorithms try to arrange all the lines of a paragraph so they are approximately the same length, which reduces any visual disparities in the appearance of the text that might distract the reader.
One simple line-breaking algorithm is the greedy algorithm: pack on to each line as many words as can fit, then go to the next line. For instance, given the text “aaa bb cc ddddd” and a line width of 6, the output would be as shown below left:
------ ------ aaa bb aaa cc bb cc ddddd ddddd ------ ------
The greedy algorithm minimizes the number of lines used, but most line-breaking algorithms prefer to minimize the amount of “raggedness.” One common measure of estheticness minimizes the slack at the end of the line; specifically, it seeks a minimum sum of the square of the number of spaces at the end of each line. The format shown above left has no space at the end of the first line, 4 spaces at the end of the second line, and 1 space at the end of the third line, for a total slack of 0 + 42 + 12 = 17. The purpose of squaring is to more heavily penalize large amounts of slack.
A better format is shown above right. That has 3 spaces at the end of the first line, 1 space at the end of the second line, and 1 space at the end of the third line, for total slack of 32 + 12 + 12 = 11.
From an algorithmic point of view, this is a minimization problem that can be solved in quadratic time by dynamic programming: Walk down the list of words, computing after each word the minimum slack to that point, then add the next word and recompute. The primary data structure used in computing the minimization is an upper-triangular matrix, shown below left:
aaa bb cc ddddd aaa bb cc ddddd ----- ----- ----- ----- ----- ----- ----- ----- 0 3 0 -3 -9 0 3 0 1 4 1 -5 1 4 1 2 4 -2 2 4 3 1 3 1 ----- ----- ----- ----- ----- ----- ----- -----
The first row is computed as 3, which is the number of spaces remaining after placing aaa
on a line, 0, which is the number of spaces remaining after placing aaa bb
on a line, -3, which is the number of spaces remaining after placing aaa bb cc
on a line, and -9, which is the number of spaces remaining after placing aaa bb cc ddddd
on a line; obviously, the last two entries on the first row are infeasible, as the line width exceeds the available space. The second row is computed as 4, which is the number of spaces remaining after placing bb
on a line, 1, which is the number of spaces remaining after placing bb cc
on a line, and -5, which is the number of spaces remaining after placing bb cc ddddd
on a line; the last entry on the row is infeasible. Likewise the third and fourth rows. The feasible portion of the upper-triangular matrix is shown above right.
The next step is to take the minimum feasible value in each column: 3, 0, 1, and 1; if you square those and compute the sum, you get 32 + 02 + 12 + 12 = 11, which is the cost we computed above. More interesting is to take the index of the minimum feasible value in each column, which is 0, 0, 1, and 3 (the 3 in the aaa
column is at index 0, the 0 in the bb
column is at index 0, the 1 in the cc
column is at index 1, and the 1 in the ddddd
column is at index 3). Then we compute the line breaks using the index minimums pairwise as follows: the first pair 0, 0 is empty; the second pair 0, 1 defines the bounds of the first output line; the third pair 1, 3 defines the bounds of the second output line; and the implicit pair 3, 4 (4 is the end of the input) defines the bounds of the third output line.
And that’s the algorithm. Beware that reducing it to code can be tricky (I got it wrong more than once) because you have to be careful to keep the row and column indexes straight and you have to remember when to add and subtract 1 to point to the previous or next column or row. The algorithm obviously has quadratic time and space complexity to compute and manipulate the upper-triangular matrix.
Your task is to write a program to format paragraphs by the dynamic programming algorithm described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Distinct Words
January 24, 2017
I’m not sure of the original source of today’s exercise; it could be a homework problem or an interview question:
Given a huge file of words, print all the distinct words in it, in ascending order; the definition of “huge” is “too big to fit in memory.”
Your task is to write a program to print all the distinct words in a huge file. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Exercise 2-4
January 20, 2017
I’m enjoying these small exercises from The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie. We’ll do another one today.
In Section 2.8, Kernighan and Ritchie give a function squeeze
that deletes all characters c
from a string s
:
/* squeeze: delete all c from s */ void squeeze(char s[], int c) { int i, j; for (i = j = 0; s[i] != '\0'; i++) if (s[i] != c) s[j++] = s[i]; s[j] '\0'; }
Then, in Exercise 2-4, they ask the reader to “Write an alternate version of squeeze(s1,s2)
that deletes each character in s1
that matches any character in the string s2
.”
Your task is to write both versions of squeeze
in your favorite language; if you choose C, half the work is already done. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Exercise 1-9
January 17, 2017
Sometimes it’s good to go back to basics. Here is Exercise 1-9 from The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie:
Write a program to copy its input to its output, replacing each string of one or more blanks by a single blank.
Your task is to write a solution to Exercise 1-9. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Words On A Telephone Keypad
January 13, 2017
The digits on a telephone keypad are associated with letters; for instance, the digit 2 is associated with the letters A, B and C, and the digit 7 is associated with the letters P, Q, R and S. Thus, a word can be converted to its numeric equivalent; for instance, PRAXIS can be converted to the number 772947. The conversion is not necessarily unique, so ACT, BAT and CAT all convert to 228.
Your task is to write a program that takes a number, such as 228, and returns a list of all the words in a dictionary that are represented by that number. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Prime String
January 10, 2017
Today’s exercise is easy to describe, but has some tricky edge cases that make it hard to implement:
Given a string of the prime numbers appended sequentially — 2357111317192…. — and an index n, return the string of five digits beginning at index n. For instance, the five characters starting at index n = 50 are 03107. The first 61 characters of the prime string are shown below:
0 1 2 3 4 5 6 0123456789012345678901234567890123456789012345678901234567890 2357111317192329313741434753596167717379838997101103107109113
Your task is to write a program to find the substring of the string of all primes starting at the nth character. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Fibonacho Numbers
January 6, 2017
Mathematicians are strange people:
Two people are sharing a plate of nachos. They take turns dividing the nachos, each taking the nth Fibonacci number of nachos on the nth turn. When the number of nachos left is less than the next Fibonacci number, they start the sequence over. What number of nachos (less than 500) requires the most number of restarts?
For instance, if you start with n = 11 nachos, the first person takes 1 nacho (leaving 10), the second person takes 1 nacho (leaving 9), the first person takes 2 nachos (leaving 7), the second person takes 3 nachos (leaving 4), and the process restarts. Then the first person takes 1 nacho (leaving 3), the second person takes 1 nacho (leaving 2), and the first person takes 2 nachos (leaving none). There were two restarts, which we can notate as [1,1,2,3], [1,1,2].
The fibonacho numbers are those starting numbers n that require more restarts than any smaller number. Thus, the first fibonacho number is 1 from [1], the second fibonacho number is 3 from [1,1], [1], the third fibonacho number is 10 from [1,1,2,3], [1,1], [1], and so on.
Your task is to write programs that calculate the number of restarts for a given n, and the sequence of fibonacho numbers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.