String Subsets
November 23, 2010
This exercise is part of our on-going series of interview questions:
Given two strings, determine if all the characters in the second string appear in the first string; thus, DA is a subset of ABCD. Counts matter, so DAD is not a subset of ABCD, since there are two D in the second string but only one D in the first string. You may assume that the second string is no longer than the first string.
Your task is to write a function to determine if one string is a subset of another string. You should work as you would in a programming interview; if you find one solution, search for a better solution. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Topological Sort
November 19, 2010
A graph G is a collection of its vertices V and the edges E between them: G(V, E); the interstate highway system is an example of a graph, with cities as vertices and highways as edges. A directed graph is a graph in which each edge is identified with a from vertex and a to vertex; the roads in some city centers can be considered a directed graph, because one-way roads only allow traffic in a single direction (Venice has one-way canals, which blew my mind the first time I saw a sensico unico sign). A directed acyclic graph, sometimes called a DAG, is a directed graph in which there are no cycles, that is, by following the successors of a vertex you can never return to that vertex; the tasks involved in building a house form a DAG, in which the framing must be done before drywall can be installed, and the modules of a program library form a DAG, in which some modules must be compiled before others that depend on them.
A topological sort is an ordering of the vertices of a DAG in which each vertex appears before any of the vertices that depend on it. Topological sorts are typically messy, with multiple right answers; a fireman can spray water on a burning building even while his colleagues are searching for anyone still inside. There are many possible topological sorts of the sample graph; one of them is 7 5 11 2 3 8 9 10.
There are many ways to perform a topological sort. Perhaps the simplest is to pick a vertex that has no incoming edges; put it at the head of the sorted output, delete it and all the edges that come from it, and recur until no vertices remain. If there is more than one vertex that has no incoming edges, any of them may be chosen.
Your task is to write functions to identify cyclic graphs and perform topological sort of a graph. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
RSA Cryptography
November 16, 2010
In 1973, Clifford Cocks invented a method of public-key cryptography in which the recipient of a message holds a private key, known only to himself, and publishes a public key, which may be used by anyone to send him a message that only he will be able to read. Cocks worked for the British security service GCHQ, so he was unable to publish his work. In 1978, three MIT professors, Ronald L. Rivest, Leonard M. Adelman and Adi Shamir invented the same algorithm, independently, which they patented in 1983. The algorithm is now known as RSA, after their initials.
The basic idea of RSA starts with two large prime numbers of equal bit-length, p and q; their product n becomes the modulus of the cryptosystem. The totient of n is computed as φ(pq) = (p−1) × (q−1). Then two keys are chosen, the encryption key e and the decryption key d, such that de ≡ 1 (mod φ(pq)) and gcd(e, φ(pq)) = 1. Then, given a message m, an integer on the range 0 < m <n, the message is encrypted by computing me (mod n) and the resulting cipher-text c is decrypted by computing cd (mod n).
In practice, the keys are generated by selecting a bit-length k and and arbitrary encryption key e. A longer bit-length provides more security than a shorter bit-length; a 768-bit key has recently been factored, albeit with extreme effort (about 2000 PC-years), most commercial users of RSA are probably using 1024- or 2048-bit keys these days, and high-security users (banks, military, government, criminals) are probably using 4096- or 8192-bit keys. E must be odd, is frequently chosen to be prime to force the condition that it is co-prime to the totient, and is generally fairly small; e = 216+1 = 65537 is common. Then the key generator chooses p and q at random, computes d as the modular inverse of e, and reports n and d. In that way, nobody, not even the person generating the keys, knows p and q.
Here is a simple example from Wikipedia: Choose p = 61 and q = 53. Then n = p×q = 3233 and the totient φ(pq) = 60×52 = 3120. Choose e=17 which is co-prime to 3120 since 17 is prime and 17∤3120; the corresponding d is the inverse of 17 with respect to the modulus 3120, so d = 2753. Then the message m = 65 is encrypted as c = me (mod n) = 6517 (mod 3233) = 2790, and the cipher-text 2790 is decrypted as m = cd (mod n) = 27902753 (mod 3233) = 65.
The standard definition of RSA cryptography is known as PKCS #1. It provides a method for converting a text message to a number m suitable for encryption, and converting it back to the original text message, but we won’t examine that algorithm today. It is also possible to use RSA to provide non-forgeable signatures; the basic idea is that the sender encrypts a message hash with his decryption key, so the receiver can decrypt the message hash with the sender’s public key, which works because only the sender knows his private decryption key.
Your task is to write an RSA key generator and procedures to encrypt and decrypt messages using the RSA algorithm as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Rowland’s Prime-Generating Function
November 12, 2010
Regular readers of Programming Praxis know of my affinity for prime numbers. Today’s exercise is based on a prime-generating sequence described by Eric Rowland. Consider the sequence a1 = 7, an = an-1 + gcd(n, an-1): 7, 8, 9, 10, 15, 18, 19, 20, 21, 22, 33, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 69, 72, 73. Taking the differences between successive elements of the sequence gives a second sequence 1, 1, 1, 5, 3, 1, 1, 1, 1, 11, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 23, 3, 1. All the elements of that sequence are either 1 or prime. Eliminating the 1s gives a third sequence of primes that begins 5, 3, 11, 3, 23, 3, 47, 3, 5, 3, 101, 3, 7, 11, 3, 13, 233, 3, 467, 3, 5, 3, 941, 3. Rowland has proved that all elements of the sequence are either 1 or prime; it is conjectured but not proven that all the odd primes appear in the sequence.
It is possible to shortcut the sequence by omitting the 1s, since the number of 1s at any point can be pre-computed. If we have a least-prime-divisor function lpd, then Vladimir Shevelev describes the sequence as a1 = lpd(6-1) = 5, a2 = lpd(6-2+5) = 3, a3 = lpd(6-3+5+3) = 11, a4 = lpd(6-4+5+3+11) = 3, a5 = lpd(6-5+5+3+11+3) = 23, …, and an = lpd(6 – n + sum(a1 … an-1).
Your task is to write functions that generate the three sequences, including the shortcut. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Subset Sums
November 9, 2010
We have previously solved Part 1 and Part 2 of the Greplin Programming Challenge. In today’s exercise we will solve the third and final part:
Find all the subsets of a set of non-negative integers where the largest number is the sum of the remaining numbers, and return a count of the number of them. For instance, for the set { 1, 2, 3, 4, 6 } the subsets are 1+2=3, 1+3=4, 2+4=6, and 1+2+3=6, so the result is 4 subsets. Apply your program to the set { 3, 4, 9, 14, 15, 19, 28, 37, 47, 50, 54, 56, 59, 61, 70, 73, 78, 81, 92, 95, 97, 99 }.
Your task is to write a program to solve the challenge. In addition to solving the challenge, you might like to apply your solution to the set of prime numbers less than 210. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Weather Forecast
November 5, 2010
The internet provides a rich source of data for those who are interested: stock prices, baseball statistics, the littoral area in acres of lakes in Minnesota, and much more. In the United States, the National Oceanic and Atmospheric Administration publishes daily weather forecasts at http://weather.noaa.gov/pub/data/forecasts/.
Your task is to write a program that retrieves the current weather forecast for a requested city. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Emirps
November 2, 2010
An emirp is a prime number that is also prime when its digits are reversed, and that is not also a palindrome. For instance, 13 is an emirp because its reversal, 31, is also prime; 23 is not an emirp, even though it is prime, because its reversal, 32, is not prime; and 101 is not an emirp, even though it is prime, because it is a palindrome.
Your task is to enumerate the emirps below a million; you should strive for maximum speed. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Fibonacci Primes
October 29, 2010
A recent programming challenge asked you to solve the following problem:
Find the smallest prime fibonacci number greater than a given input number. Add one to that fibonacci prime, find the factors of the result, and return the sum of the factors. For instance, if the input number is 10, the smallest fibonacci prime greater than 10 is 13, the factors of 13 + 1 = 14 are 2 and 7, and their sum is 9.
Your task is to write a function to solve that problem, then apply your function to the input number 227000. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Benford’s Law
October 26, 2010
Benford’s Law, which was discovered by Simon Newcomb in 1881 and rediscovered by Frank Benford in 1938, states that, for many sets of numbers that arise in a scale-invariant manner, the first digit is distributed logarithmically, with the first digit 1 about 30% of the time, decreasing digit by digit until the first digit is 9 about 5% of the time. Stated mathematically, the leading digit d ∈ {1 … b-1} for a number written in base b will occur with probability Pd = logb(1 + 1/d). Thus, in base 10, the probabilities of the first digit being the number 1, 2, … 9 are 30.1%, 17.6%, 12.5%, 9.7%, 7.9%, 6.7%, 5.8%, 5.1% and 4.6%.
Benford’s Law is counter-intuitive but arises frequently in nature. It is also frequently used in auditing. Make a list of the amounts of the checks that a bookkeeper has written in the past year; if more that 5% begin with the digit 8 or 9, the bookkeeper is likely an embezzler. More important, precinct results of the disputed Iranian elections a year ago displayed anomalous first-digit counts, suggesting vote fraud.
Recently, Shriram Krishnamurthy issued a programming challenge on the Racket mailing list, asking for smallest/tightest/cleanest/best code to calculate the first-digit percentages of a list of numbers; he also challenged readers to apply the function to data in a comma-separated values file. He didn’t give a source, but did mention that he was interested in the littoral area (in acres) of the lakes of Missesota; sample data appears on the next page.
Your first task is to write a function that calculates the first-digit percentages of a list of numbers. Your second task is to calculate the first-digit percentages of the data on the next page. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Text File Databases: Part 2
October 22, 2010
In the previous exercise we developed four functions for reading records from various type of text-file databases. In today’s exercise we develop four functions for processing those records, including map, fold, filter, and for-each. We also wrote map-reduce in a previous exercise, which gives us a fifth processing function.
All four functions take as their first argument a reader function that fetches the next record from the input; we wrote some reader functions in the previous exercise, and it is common to write others for specific input formats. Map takes both a reader function and a transformer function and applies the transformer to each input record, returning a list of the transformed values in the same order as the input. Fold takes a reader function, a combining function and a base value and applies the combiner successively to each base value and the next record from the input, returning the final base value when the input is exhausted. Filter is a combinator; it takes a reader function and a predicate and returns a new reader function that passes only those input records for which the predicate is true. For-each takes a reader function and another procedure and applies the procedure to each input record in turn, only for its side-effects, until the input is exhausted; it returns nothing.
Your task is to write the four functions for processing text-file database records. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.