Sophie Germain Primes

August 6, 2013

A prime number p is a Sophie Germain prime if 2p+1 is also prime. The list of Sophie Germain primes begins 2, 3, 5, 11, 23, 29, 41, 53, 83, 89, 113, 131, (A005384); it is not known if the list is infinite. Sophie Germain primes are named after the French mathematician Marie-Sophie Germain, who in 1825 proved Fermat’s last theorem for such primes; thus, if p is a Sophie Germain prime, then there do not exist integers x, y and z not equal to zero and not a multiple of p such that xp + yp = zp.

One way to find Sophie Germain primes is to enumerate the primes p and check the primality of each 2p+1. That works fine for relatively small p. For larger p, amateur mathematician Harvey Dubner developed a sieve that speeds finding Sophie Germain primes:

Define the three polynomials fp(c) = c * 3003 * 10b – 1, fq(c) = 2 fp(c) + 1, and fr(c) = (fp(c) – 1) / 2. Choose b which is the approximate number of digits for the range of Sophie Germain primes you are searching.

Sieve each of the polynomials over the range 1 ≤ cm for some suitable m. Use as sieving primes the primes greater than 13 but less than some suitable f which is conveniently large but much less than the square root of fp(1). All three polynomials are sieved over the same sieving array. Sieving is done in a similar manner to sieving the polynomials in the multiple polynomial quadratic sieve.

After sieving is complete, apply a primality test to each fp(c) that survived all three sievings. If it is prime, check both fq(c) and fr(c) for primality. If fq(c) is prime, then fp(c) is a Sophie Germain prime. If fr(c), then fr(c) is a Sophie Germain prime.

Your task is to write a sieve that finds Sophie Germain primes as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Ordered Hash Tables

August 2, 2013

In a normal hash table, a key is hashed then linear search is performed in the bucket where the key will be found if it exists in the hash table. A successful search terminates as soon as the key is found, on average half-way through the bucket, but an unsuccessful search requires that every item in the bucket be examined.

Donald Knuth proposed that instead of keeping keys in a bucket in random order they be kept in increasing order, so that a search can stop as soon as the search passes the value of the key. That means inserts take longer; you have to find the right place in the bucket to put the key instead of just putting it at the beginning of the bucket. But lookups are quicker, as an unsuccessful search can terminate half-way through the bucket, on average. This change also means that the data type of the hash table changes, as now the comparison function is less-than rather than equal-to.

Your task is to write a small library of ordered hash tables; you should provide lookup, insert and delete functions, at least. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

There was a big buzz on Stack Overflow recently when a question about buying croissants was asked:

At my office someone buys croissants for everybody every Friday. Sometimes we have problems arranging a suitable schedule because of absences — illness, vacation, training, customer meetings — so we’re looking for an algorithm that randomly chooses the person who should bring croissants so that everyone buys and consumes about as many croissants as everyone else.

The question caused a big ruckus at Stack Overflow, and is currently locked while the community decides if the question is appropriate; I personally think that it is, and further, that the question is both fun and practical.

Your task is to devise a suitable algorithm for deciding who should buy croissants, and demonstrate by simulation that your algorithms is fair. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

We have today another of our unlimited supply of interview questions: this one supposedly comes from Amazon:

Given an array X[0..n-1] of integers sorted into ascending order with no duplicates, find an array item that is also its index, so that X[i] = i.

For example, X[3] = 3 in the array shown below:

i     0 1 2 3 4 5
x[i] -3 0 1 3 5 7

Your task is to write a program that finds i. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exericse in the comments below.

Pages: 1 2

Telephone Lookup

July 23, 2013

In the early days of Unix, Mike Lesk wrote a directory-assistance program that allowed users to input a name using the numeric keypad on a telephone and get a telephone number in return. The program worked by “signing” each name with its telephonic equivalent, then storing the signatures in a database. This scheme makes collisions possible, but in practice collisions are rare, and when they do happen the user is prompted for additional information.

Your task is to write a program that takes a name and returns a telephone number; a database is provided on the next page. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3

J K Rowling

July 19, 2013

It has been widely reported recently, including articles in the New York Times and the London Sunday Times, that J K Rowling wrote a book under a pseudonym that was discovered by a forensic linguist. Time magazine explains how the discovery was made:

As one part of his work, Juola uses a program to pull out the hundred most frequent words across an author’s vocabulary. This step eliminates rare words, character names and plot points, leaving him with words like of and but, ranked by usage. Those words might seem inconsequential, but they leave an authorial fingerprint on any work.

“Propositions and articles and similar little function words are actually very individual,” Juola says. “It’s actually very, very hard to change them because they’re so subconscious.”

The Time article gives a link to the program Juola used, but that site gives very little information about how the program works.

Your task is to write a program that compares the similarity of two texts to determine authorship; this task is purposely vague so you can make your own decisions about how to proceed. When you are finished, you are welcome to read or run a suggested solution or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Vampire Numbers

July 16, 2013

We have today a problem from recreational mathematics.

A vampire number is a number v = x × y such that x and y are not both divisible by 10 and the digits of v consist of the digits of x and the digits of y. The number of digits n in the vampire number must be even, and the number of digits in each of the two fangs x and y must be half the number of digits in v. For instance 1260 = 21 × 60 and 1395 = 15 × 93 are both vampire numbers.

Your task is to write a program that calculates the vampire numbers of n digits. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Dynamic Hash Tables

July 12, 2013

Our Standard Prelude has included a simple implementation of hash tables since the very beginning. That implementation is probably twenty years old, or more; it’s one of the first pieces of code that I wrote when I was first learning Scheme, and even though it has had a few improvements since then, it’s time for a better implementation of hash tables. We’ll do that today.

Let’s review. A hash table provides storage for key/value pairs, where the only operation permitted is to compare two keys for equality; unlike a binary search tree, there is no ordering relationship between two keys. The hash table stores key/value pairs in an array of lists; a hash function converts a key to an address in the array, then a linear search through the list at that array location identifies the key/value pair. The size of the array is fixed in advance; if it is too small, then the chains at each array location grow too long, which slows down the operation of the hash table, but if it is too large, space is wasted. Assuming that the array isn’t too very small, so that the chains don’t grow too long, and assuming a fair hash function, hash tables provide O(1) access to any key/value pair.

A dynamic hash table adjusts automatically to the number of key/value pairs that it stores. That implies some kind of compromise in the O(1) behavior of the hash table. One possibility is to store the chains in some kind of tree structure, but that means each access will cost O(log n). Another possibility is to double the size of the array whenever the load factor becomes too high, but that means pauses whenever the array is resized, which can be annoying (or worse) for some applications, and still leaves the asymptotic behavior at something greater than O(1).

So we cheat. We will store the chains in a two-stage data structure that consists of arrays of width w stored in the growable arrays of a previous exercise, and systematically increase the size of the array at each insertion instead of stopping from time to time for a big resizing. The arrays of width w mean that the base of the logarithm is w instead of 2, so for instance to store a million key/value pairs with w = 256 and a load factor of 5 we need 200,000 chains which will be stored in 800 elements of the growable array at a maximum depth from the root of 9, and we’ll say that 9 is close enough to 1 that O(log n) ≈ O(1). It’s fun to cheat!

To control the doubling of the array we maintain three variables: u is the current number of chains, m is the current number of available chains, and p is the index number of the next chain to be split. Variables u and m are initialized to w; p runs from 0 to m, which doubles when p reaches it, resetting p to 0 for the next doubling. When the average load factor is exceeded, p increases, the keys at bucket p are rehashed and split into two buckets at p and p + m, and p and u are increased by one. The table shrinks in a similar way during deletions, except that m and u never fall below w.

Your task is to write a program that maintains dynamic hash tables. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3

Many programming languages provide a library function that calculates a serial number for each day, making it easy to calculate the number of days between two dates; we provide such a function in the Standard Prelude. Sometimes, though, the need is to calculate weekdays rather than total days.

Your task is to write a program that calculates the number of weekdays between two dates. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Decoding Text-Speak

July 2, 2013

Sm ppl cmprs txt msgs by rtnng only ths vwls tht bgn a wrd and by rplcng dbld ltrs wth sngl ltrs.

With a proper dictionary, it is possible to expand all the possibilities for a word. For instance, the “Sm” that starts the sentence above is properly translated “Some” but these other words are possible: same, sam, sum, seem, seam, sumo, and others.

Your task is to write a program that, given a sentence in text-speak, returns a list of all possibilities for each word. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2