Programming Praxis

J K Rowling

July 19, 2013

It has been widely reported recently, including articles in the New York Times and the London Sunday Times, that J K Rowling wrote a book under a pseudonym that was discovered by a forensic linguist. Time magazine explains how the discovery was made:

As one part of his work, Juola uses a program to pull out the hundred most frequent words across an author’s vocabulary. This step eliminates rare words, character names and plot points, leaving him with words like of and but, ranked by usage. Those words might seem inconsequential, but they leave an authorial fingerprint on any work.

“Propositions and articles and similar little function words are actually very individual,” Juola says. “It’s actually very, very hard to change them because they’re so subconscious.”

The Time article gives a link to the program Juola used, but that site gives very little information about how the program works.

Your task is to write a program that compares the similarity of two texts to determine authorship; this task is purposely vague so you can make your own decisions about how to proceed. When you are finished, you are welcome to read or run a suggested solution or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

6 Comments »

Vampire Numbers

July 16, 2013

We have today a problem from recreational mathematics.

A vampire number is a number v = x × y such that x and y are not both divisible by 10 and the digits of v consist of the digits of x and the digits of y. The number of digits n in the vampire number must be even, and the number of digits in each of the two fangs x and y must be half the number of digits in v. For instance 1260 = 21 × 60 and 1395 = 15 × 93 are both vampire numbers.

Your task is to write a program that calculates the vampire numbers of n digits. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

11 Comments »

Dynamic Hash Tables

July 12, 2013

Our Standard Prelude has included a simple implementation of hash tables since the very beginning. That implementation is probably twenty years old, or more; it’s one of the first pieces of code that I wrote when I was first learning Scheme, and even though it has had a few improvements since then, it’s time for a better implementation of hash tables. We’ll do that today.

Let’s review. A hash table provides storage for key/value pairs, where the only operation permitted is to compare two keys for equality; unlike a binary search tree, there is no ordering relationship between two keys. The hash table stores key/value pairs in an array of lists; a hash function converts a key to an address in the array, then a linear search through the list at that array location identifies the key/value pair. The size of the array is fixed in advance; if it is too small, then the chains at each array location grow too long, which slows down the operation of the hash table, but if it is too large, space is wasted. Assuming that the array isn’t too very small, so that the chains don’t grow too long, and assuming a fair hash function, hash tables provide O(1) access to any key/value pair.

A dynamic hash table adjusts automatically to the number of key/value pairs that it stores. That implies some kind of compromise in the O(1) behavior of the hash table. One possibility is to store the chains in some kind of tree structure, but that means each access will cost O(log n). Another possibility is to double the size of the array whenever the load factor becomes too high, but that means pauses whenever the array is resized, which can be annoying (or worse) for some applications, and still leaves the asymptotic behavior at something greater than O(1).

So we cheat. We will store the chains in a two-stage data structure that consists of arrays of width w stored in the growable arrays of a previous exercise, and systematically increase the size of the array at each insertion instead of stopping from time to time for a big resizing. The arrays of width w mean that the base of the logarithm is w instead of 2, so for instance to store a million key/value pairs with w = 256 and a load factor of 5 we need 200,000 chains which will be stored in 800 elements of the growable array at a maximum depth from the root of 9, and we’ll say that 9 is close enough to 1 that O(log n) ≈ O(1). It’s fun to cheat!

To control the doubling of the array we maintain three variables: u is the current number of chains, m is the current number of available chains, and p is the index number of the next chain to be split. Variables u and m are initialized to w; p runs from 0 to m, which doubles when p reaches it, resetting p to 0 for the next doubling. When the average load factor is exceeded, p increases, the keys at bucket p are rehashed and split into two buckets at p and p + m, and p and u are increased by one. The table shrinks in a similar way during deletions, except that m and u never fall below w.

Your task is to write a program that maintains dynamic hash tables. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

No Exercise Today

July 9, 2013

There is no exercise today.

My mother, who is eighty-eight years old, was the bottom of three people who fell down some stairs a few days ago, and sustained a cut on her head that required five stitches, four broken ribs in her left back, and one bruise that runs from her left forearm, up to her left shoulder, and down the left side of her back, all the way to her left calf. Her injuries aren’t serious, but at her age neither are they simple, and it’s been difficult.

I’ve spent two recent overnights in hospital emergency rooms, and have been taking care of my mother during the day, so I haven’t had time to write an exercise for today. I do hope to have one ready for Friday.

In the meantime, none of my readers has done all the exercises. So pick an exercise you haven’t done and post your solution in the normal way.

Posted by programmingpraxis

Filed in Administrivia

7 Comments »

Weekdays Between Two Dates

July 5, 2013

Many programming languages provide a library function that calculates a serial number for each day, making it easy to calculate the number of days between two dates; we provide such a function in the Standard Prelude. Sometimes, though, the need is to calculate weekdays rather than total days.

Your task is to write a program that calculates the number of weekdays between two dates. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

1 Comment »

Decoding Text-Speak

July 2, 2013

Sm ppl cmprs txt msgs by rtnng only ths vwls tht bgn a wrd and by rplcng dbld ltrs wth sngl ltrs.

With a proper dictionary, it is possible to expand all the possibilities for a word. For instance, the “Sm” that starts the sentence above is properly translated “Some” but these other words are possible: same, sam, sum, seem, seam, sumo, and others.

Your task is to write a program that, given a sentence in text-speak, returns a list of all possibilities for each word. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

7 Comments »

A Programming Puzzle

June 28, 2013

Today’s exercise is a delightful little programming puzzle:

Write a function f so that f(f(n)) = -n for all integers n.

When I first saw this puzzle, it took me two days before I had the necessary flash of inspiration, then about two minutes to write. Do yourself a favor and don’t peek at the suggested solution until you figure it out yourself.

Your task is to write function f as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

45 Comments »

Swap list nodes

June 25, 2013

We have today another from our never-ending list of interview questions:

Given a linked list, swap the kth node from the head of the list with the kth node from the end of the list.

Your task is to write a function to perform the indicated task; be sure to test it thoroughly. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

4 Comments »

Multiple Polynomial Quadratic Sieve

June 21, 2013

We studied the basic quadratic sieve algorithm for factoring integers in two previous exercises. Today we examine the multiple polynomial variant of the quadratic sieve, due to Peter Montgomery, which adds considerable power to the basic algorithm; we will be following Figure 2.2 and the surrounding text from Scott Contini’s thesis.

In the basic quadratic sieve we used a polynomial of the form g(x) = (x + b)² – n where b = ⌈√n⌉. The problem with using a single polynomial is that the values of g(x) increase as x increases, making them less likely to be smooth. Eventually, the algorithm “runs out of gas” when the values of g(x) grow too large.

Montgomery’s suggestion was to use multiple polynomials of the form g_{a, b}(x) = (a x + b)² − n with a, b integers with 0 < b ≤ a. The graph of g_{a, b}(x) is a parabola, and its values will be smallest when a ≈ √(2n) / m. Thus, we choose b so that b² − n is divisible by a, say b² − n = a c for some integer c, and a = q² for some integer q. Then we calculate g_{a, b}(x) / a = a x² + 2 b x + c, and, after sieving over the range −m .. m, when g_{a, b}(x) is smooth over the factor base we record the relation ((a x + b) q⁻¹)² = a x² + 2 b x + c.

Here is our version of Contini’s algorithm:

Compute startup data: Choose f, the upper bound for factor base primes, and m, which is half the size of the sieve. Determine the factor base primes p < f such that the jacobi symbol (n/p) is 1 indicating that there is a solution to t² ≡ n (mod p); also include 2 in the factor base. For each factor base prime p, compute and store t, a modular square root of n (mod p). Also compute and store the base-2 logarithm of each prime l = ⌊log₂ p⌉ (the floor and ceiling brackets indicate rounding).

Initialize a new polynomial: Find a prime q ≈ √( √(2n) / m) such that the jacobi symbol (n/q) is 1, indicating that n is a quadratic residue mod q, and let a = q² (mod n). Compute b, a modular square root of n mod a; you will have to compute the square root of n mod q then “lift” the root mod q² using Hensel’s Lemma. For each odd prime p in the factor base, compute soln1_p = a⁻¹ (tmem_p − b) and soln2_p = a⁻¹ (−tmem_p − b).

Perform sieving: Initialize a sieve array of length 2 m + 1 to zeros, with indices from −m to m. For each odd prime p in the factor base, add l_p to the locations soln1_p + i p for all integers i that satisfy −m ≤ soln1_p + i p ≤ m, and likewise for soln2_p. For the prime p = 2, sieve only with soln1_p.

Trial division: Scan sieve array for locations x that have accumulated a value of a least log(m √n) minus a small error term. Trial divide g_{a, b}(x) by the primes in the factor base. If g_{a, b}(x) factors into primes less than f, then save smooth relation as indicated above. After scanning entire sieve array, if there are more smooth relations than primes in the factor base, then go to linear algebra step. Otherwise, go to initialization step to select a new polynomial.

Linear algebra: Solve the matrix as in Dixon’s method and the continued fraction method. For each null vector, construct relation of form x² ≡ y² (mod n) and attempt to factor n by computing gcd(x − y, n). If all null vectors fail to give a non-trivial factorization, go to initialization step to select a new polynomial.

Your task is to write a program to factor integers using the multiple polynomial quadratic sieve as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

3SUM

June 18, 2013

Today’s exercise is a classic problem of computer science: given an array of positive and negative integers, find three that sum to zero, or indicate that no such triple exists. This is similar to the subset sum problem, which we solved in a previous exercise, but simpler because of the limit to three items. A brute force solution that operates in O(n³) time uses three nested loops to select items from the array and test their sum, but an O(n²) solution exists. For instance, in the array [8, -25, 4, 10, -10, -7, 2, -3], the numbers -10, 2 and 8 sum to zero, as do the numbers -7, -3, and 10.

Your task is to write a program that solves the 3SUM problem in O(n²) time. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

7 Comments »

Programming Praxis

J K Rowling

July 19, 2013

Vampire Numbers

July 16, 2013

Dynamic Hash Tables

July 12, 2013

No Exercise Today

July 9, 2013

Weekdays Between Two Dates

July 5, 2013

Decoding Text-Speak

July 2, 2013

A Programming Puzzle

June 28, 2013

Swap list nodes

June 25, 2013

Multiple Polynomial Quadratic Sieve

June 21, 2013

3SUM

June 18, 2013

Categories

Archives

Archives