Polite Numbers
December 17, 2010
A number is polite if it can be written as the sum of two or more consecutive numbers; for instance, 7 is polite because it can be written at 3 + 4. Some numbers can be written as the sum of two or more consecutive numbers in more than one way; for instance, 15 = 1 + 2 + 3 + 4 + 5 = 4 + 5 + 6 = 7 + 8. The number of ways that a number can be written as the sum of two or more consecutive numbers is its politeness. Any number that is a power of 2 cannot be written as the sum of two or more consecutive numbers; such a number has a politeness of 0, and is thus impolite.
There is a set of consecutive integers that sum to a given number for each odd divisor of the number greater than 1. For instance, the divisors of 28 are 1, 2, 4, 7, 14, 28. The only odd divisor of 28 greater than 1 is 7, so 28 has a politeness of 1. The set of consecutive integers has length 7 (the divisor) and is centered at 4 = 28 ÷ 7: 1 + 2 + 3 + 4 + 5 + 6 + 7; that works because there are 7 numbers with an average of 4 (since they are centered on 4). In some cases, such as the divisor 11 of the number 33, the set of numbers includes negative integers: -2 + -1 + 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8; in that case, the negative integers cancel out the corresponding positive integers, so the remaining set is 3 + 4 + 5 + 6 + 7 + 8 = 33.
Your task is to write a program that calculates all the ways that a number can be written as the sum of two or more consecutive numbers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Longest Duplicated Substring
December 14, 2010
In a previous exercise, we looked at the problem of finding the longest palindrome in a string. In today’s exercise, we look at a similar problem, finding the longest duplicated substring in a string.
The algorithm for this task is well known. First build a suffix list that has all the substring suffices of the input string. For instance, if the input is “banana”, the suffix list is “a”, “na”, “ana”, “nana”, “anana”, “banana”. Then sort the suffix list, bringing suffices with common beginnings together. Finally, scan the sorted suffix list, computing the common lengths of adjacent pairs, and report the longest.
Your task is to write a program to find the longest duplicated substring within a string. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Two Random Selections
December 10, 2010
In today’s exercise, we look at two programs that make random selections.
The first program picks an item from a list whose size is not known in advance, making a single pass through the list. The algorithm selects the first item in the list with probability 1/1, then replaces that item with the second item in the list with probability 1/2, then replaces whichever item is currently selected with the third item in the list with probability 1/3, and so on; when the end of the list is reached, each item will be selected with probability 1/n, where n is the number of items in the list. This algorithm is used in the unix fortune program, which randomly selects a line from a file of pithy sayings, and appears in the Standard Prelude under the name fortune.
The second program is used by auditors, pollsters, and others who need to randomly select m items from a population of n items. The program outputs a list of m numbers in the range 1 to n, in order; the user then selects the sampled items from a numbered list of the items in the population. The algorithm runs through the integers from 1 to n and selects each item with probability m/n, where at each step m is the number remaining to be selected and n is the number remaining in the population.
Your task is to write the two programs described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Ullman’s Puzzle
December 7, 2010
This puzzle is due to Jeffrey Ullman:
Given a list of n real numbers, a real number t, and an integer k, determine if there exists a k-element subset of the original list of n real numbers that is less than t.
For instance, given the list of 25 real numbers 18.1, 55.1, 91.2, 74.6, 73.0, 85.9, 73.9, 81.4, 87.1, 49.3, 88.8, 5.7, 26.3, 7.1, 58.2, 31.7, 5.8, 76.9, 16.5, 8.1, 48.3, 6.8, 92.4, 83.0, 19.6, t = 98.2 and k = 3, the 3-element subset 31.7, 16.5 and 19.6 sums to 67.8, which is less than 98.2, so the result is true.
This is a puzzle, so you’re not allowed to look at the suggested solution until you have your own solution.
Your task is to write a function that makes that determination. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Maximum Sum Subsequence
December 3, 2010
We’ll categorize today’s exercise as part of our on-going series of interview questions, but it’s really more than that:
Given a sequence of integers, both positive and negative, find the contiguous subsequence with the maximum sum. For instance, given the sequence 31, -41, 59, 26, -53, 58, 97, -93, -23, 84, the maximum sum subsequence is 59, 26, -53, 58, 97, which sums to 187.
Your task is to write a series of functions that takes a sequence (use either a list or an array, whichever is convenient for your programming language) and returns the sum of the maximum sum subsequence; you should find solutions with time complexities of O(n3), O(n2), O(n log n), and O(n). When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Form Letters
November 30, 2010
Welcome back, Jane!
We hope that you and all the members
of the Public family are constantly
reminding your neighbors there
on Maple Street to shop with us.
As usual, we will ship your order to
Ms. Jane Q. Public
600 Maple Street
Your Town, Iowa 12345
Everybody hates form letters. But they are part of the computing universe, and today’s exercise asks you to print them. Input to the form letter generator comes in two parts. First, there is a schema that defines the letter to be written. Here is the schema for the letter shown above:
Welcome back, $1!
We hope that you and all the members
of the $0 family are constantly
reminding your neighbors there
on $5 to shop with us.
As usual, we will ship your order to
$3 $1 $2. $0
$4 $5
$6, $7 $8
Variable text is identified as $n, where n is the field number from a database; although it’s not shown above, n can be larger than 9, extending right-ward until a non-digit is encountered. Also not shown above is the construct $$, which prints a literal dollar sign.
The data comes from a comma-separated values file, of the type we have previously encountered. In this case, records have nine fields: last name, first name, middle initial, title, street number, street name, city, state, and zip code. Here is a sample two-record data file:
Public,Jane,Q,Ms.,600,Maple Street,Your Town, Iowa,12345
Smith,John,Z,Dr.,1234,Main Street,Anytown,Missouri,63011
Your task is to write a program that takes a schema and a data file and writes a series of form letters. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Divisors And Totatives
November 26, 2010
We have examined functions to compute the factors of a number in several previous exercises. In today’s exercise we examine functions to compute divisors and totients, which are concepts of number theory closely related to factors.
The divisors of a number are those numbers that divide it evenly; for example, the divisors of 60 are 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, and 60. The sum of the divisors of 60 is 168, and the number of divisors of 60 is 12.
The totatives of a number are those numbers less than the given number and coprime to it; two numbers are coprime if they have no common factors other than 1. The number of totatives of a given number is called its totient. For example, the totatives of 30 are 1, 7, 11, 13, 17, 19, 23, and 29, and the totient of 30 is 8. The totient function was discovered by Leonhard Euler, and it pops up in all kinds of strange places in number theory; for instance, the totatives of 30 are the spokes on a 2,3,5 factorization wheel.
Your task is to write a small library of five functions that compute the divisors of a number, the sum and number of its divisors, the totatives of a number, and its totient. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
String Subsets
November 23, 2010
This exercise is part of our on-going series of interview questions:
Given two strings, determine if all the characters in the second string appear in the first string; thus, DA is a subset of ABCD. Counts matter, so DAD is not a subset of ABCD, since there are two D in the second string but only one D in the first string. You may assume that the second string is no longer than the first string.
Your task is to write a function to determine if one string is a subset of another string. You should work as you would in a programming interview; if you find one solution, search for a better solution. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Topological Sort
November 19, 2010
A graph G is a collection of its vertices V and the edges E between them: G(V, E); the interstate highway system is an example of a graph, with cities as vertices and highways as edges. A directed graph is a graph in which each edge is identified with a from vertex and a to vertex; the roads in some city centers can be considered a directed graph, because one-way roads only allow traffic in a single direction (Venice has one-way canals, which blew my mind the first time I saw a sensico unico sign). A directed acyclic graph, sometimes called a DAG, is a directed graph in which there are no cycles, that is, by following the successors of a vertex you can never return to that vertex; the tasks involved in building a house form a DAG, in which the framing must be done before drywall can be installed, and the modules of a program library form a DAG, in which some modules must be compiled before others that depend on them.
A topological sort is an ordering of the vertices of a DAG in which each vertex appears before any of the vertices that depend on it. Topological sorts are typically messy, with multiple right answers; a fireman can spray water on a burning building even while his colleagues are searching for anyone still inside. There are many possible topological sorts of the sample graph; one of them is 7 5 11 2 3 8 9 10.
There are many ways to perform a topological sort. Perhaps the simplest is to pick a vertex that has no incoming edges; put it at the head of the sorted output, delete it and all the edges that come from it, and recur until no vertices remain. If there is more than one vertex that has no incoming edges, any of them may be chosen.
Your task is to write functions to identify cyclic graphs and perform topological sort of a graph. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
RSA Cryptography
November 16, 2010
In 1973, Clifford Cocks invented a method of public-key cryptography in which the recipient of a message holds a private key, known only to himself, and publishes a public key, which may be used by anyone to send him a message that only he will be able to read. Cocks worked for the British security service GCHQ, so he was unable to publish his work. In 1978, three MIT professors, Ronald L. Rivest, Leonard M. Adelman and Adi Shamir invented the same algorithm, independently, which they patented in 1983. The algorithm is now known as RSA, after their initials.
The basic idea of RSA starts with two large prime numbers of equal bit-length, p and q; their product n becomes the modulus of the cryptosystem. The totient of n is computed as φ(pq) = (p−1) × (q−1). Then two keys are chosen, the encryption key e and the decryption key d, such that de ≡ 1 (mod φ(pq)) and gcd(e, φ(pq)) = 1. Then, given a message m, an integer on the range 0 < m <n, the message is encrypted by computing me (mod n) and the resulting cipher-text c is decrypted by computing cd (mod n).
In practice, the keys are generated by selecting a bit-length k and and arbitrary encryption key e. A longer bit-length provides more security than a shorter bit-length; a 768-bit key has recently been factored, albeit with extreme effort (about 2000 PC-years), most commercial users of RSA are probably using 1024- or 2048-bit keys these days, and high-security users (banks, military, government, criminals) are probably using 4096- or 8192-bit keys. E must be odd, is frequently chosen to be prime to force the condition that it is co-prime to the totient, and is generally fairly small; e = 216+1 = 65537 is common. Then the key generator chooses p and q at random, computes d as the modular inverse of e, and reports n and d. In that way, nobody, not even the person generating the keys, knows p and q.
Here is a simple example from Wikipedia: Choose p = 61 and q = 53. Then n = p×q = 3233 and the totient φ(pq) = 60×52 = 3120. Choose e=17 which is co-prime to 3120 since 17 is prime and 17∤3120; the corresponding d is the inverse of 17 with respect to the modulus 3120, so d = 2753. Then the message m = 65 is encrypted as c = me (mod n) = 6517 (mod 3233) = 2790, and the cipher-text 2790 is decrypted as m = cd (mod n) = 27902753 (mod 3233) = 65.
The standard definition of RSA cryptography is known as PKCS #1. It provides a method for converting a text message to a number m suitable for encryption, and converting it back to the original text message, but we won’t examine that algorithm today. It is also possible to use RSA to provide non-forgeable signatures; the basic idea is that the sender encrypts a message hash with his decryption key, so the receiver can decrypt the message hash with the sender’s public key, which works because only the sender knows his private decryption key.
Your task is to write an RSA key generator and procedures to encrypt and decrypt messages using the RSA algorithm as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.