July 12, 2013
Our Standard Prelude has included a simple implementation of hash tables since the very beginning. That implementation is probably twenty years old, or more; it’s one of the first pieces of code that I wrote when I was first learning Scheme, and even though it has had a few improvements since then, it’s time for a better implementation of hash tables. We’ll do that today.
Let’s review. A hash table provides storage for key/value pairs, where the only operation permitted is to compare two keys for equality; unlike a binary search tree, there is no ordering relationship between two keys. The hash table stores key/value pairs in an array of lists; a hash function converts a key to an address in the array, then a linear search through the list at that array location identifies the key/value pair. The size of the array is fixed in advance; if it is too small, then the chains at each array location grow too long, which slows down the operation of the hash table, but if it is too large, space is wasted. Assuming that the array isn’t too very small, so that the chains don’t grow too long, and assuming a fair hash function, hash tables provide O(1) access to any key/value pair.
A dynamic hash table adjusts automatically to the number of key/value pairs that it stores. That implies some kind of compromise in the O(1) behavior of the hash table. One possibility is to store the chains in some kind of tree structure, but that means each access will cost O(log n). Another possibility is to double the size of the array whenever the load factor becomes too high, but that means pauses whenever the array is resized, which can be annoying (or worse) for some applications, and still leaves the asymptotic behavior at something greater than O(1).
So we cheat. We will store the chains in a two-stage data structure that consists of arrays of width w stored in the growable arrays of a previous exercise, and systematically increase the size of the array at each insertion instead of stopping from time to time for a big resizing. The arrays of width w mean that the base of the logarithm is w instead of 2, so for instance to store a million key/value pairs with w = 256 and a load factor of 5 we need 200,000 chains which will be stored in 800 elements of the growable array at a maximum depth from the root of 9, and we’ll say that 9 is close enough to 1 that O(log n) ≈ O(1). It’s fun to cheat!
To control the doubling of the array we maintain three variables: u is the current number of chains, m is the current number of available chains, and p is the index number of the next chain to be split. Variables u and m are initialized to w; p runs from 0 to m, which doubles when p reaches it, resetting p to 0 for the next doubling. When the average load factor is exceeded, p increases, the keys at bucket p are rehashed and split into two buckets at p and p + m, and p and u are increased by one. The table shrinks in a similar way during deletions, except that m and u never fall below w.
Your task is to write a program that maintains dynamic hash tables. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
July 5, 2013
Many programming languages provide a library function that calculates a serial number for each day, making it easy to calculate the number of days between two dates; we provide such a function in the Standard Prelude. Sometimes, though, the need is to calculate weekdays rather than total days.
Your task is to write a program that calculates the number of weekdays between two dates. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
July 2, 2013
Sm ppl cmprs txt msgs by rtnng only ths vwls tht bgn a wrd and by rplcng dbld ltrs wth sngl ltrs.
With a proper dictionary, it is possible to expand all the possibilities for a word. For instance, the “Sm” that starts the sentence above is properly translated “Some” but these other words are possible: same, sam, sum, seem, seam, sumo, and others.
Your task is to write a program that, given a sentence in text-speak, returns a list of all possibilities for each word. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
June 28, 2013
Today’s exercise is a delightful little programming puzzle:
Write a function
f(f(n)) = -nfor all integers n.
When I first saw this puzzle, it took me two days before I had the necessary flash of inspiration, then about two minutes to write. Do yourself a favor and don’t peek at the suggested solution until you figure it out yourself.
Your task is to write function
f as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
June 25, 2013
We have today another from our never-ending list of interview questions:
Given a linked list, swap the kth node from the head of the list with the kth node from the end of the list.
Your task is to write a function to perform the indicated task; be sure to test it thoroughly. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
June 21, 2013
We studied the basic quadratic sieve algorithm for factoring integers in two previous exercises. Today we examine the multiple polynomial variant of the quadratic sieve, due to Peter Montgomery, which adds considerable power to the basic algorithm; we will be following Figure 2.2 and the surrounding text from Scott Contini’s thesis.
In the basic quadratic sieve we used a polynomial of the form g(x) = (x + b)2 – n where b = ⌈√n⌉. The problem with using a single polynomial is that the values of g(x) increase as x increases, making them less likely to be smooth. Eventually, the algorithm “runs out of gas” when the values of g(x) grow too large.
Montgomery’s suggestion was to use multiple polynomials of the form ga, b(x) = (a x + b)2 − n with a, b integers with 0 < b ≤ a. The graph of ga, b(x) is a parabola, and its values will be smallest when a ≈ √(2n) / m. Thus, we choose b so that b2 − n is divisible by a, say b2 − n = a c for some integer c, and a = q2 for some integer q. Then we calculate ga, b(x) / a = a x2 + 2 b x + c, and, after sieving over the range −m .. m, when ga, b(x) is smooth over the factor base we record the relation ((a x + b) q−1)2 = a x2 + 2 b x + c.
Here is our version of Contini’s algorithm:
Compute startup data: Choose f, the upper bound for factor base primes, and m, which is half the size of the sieve. Determine the factor base primes p < f such that the jacobi symbol (n/p) is 1 indicating that there is a solution to t2 ≡ n (mod p); also include 2 in the factor base. For each factor base prime p, compute and store t, a modular square root of n (mod p). Also compute and store the base-2 logarithm of each prime l = ⌊log2 p⌉ (the floor and ceiling brackets indicate rounding).
Initialize a new polynomial: Find a prime q ≈ √( √(2n) / m) such that the jacobi symbol (n/q) is 1, indicating that n is a quadratic residue mod q, and let a = q2 (mod n). Compute b, a modular square root of n mod a; you will have to compute the square root of n mod q then “lift” the root mod q2 using Hensel’s Lemma. For each odd prime p in the factor base, compute soln1p = a−1 (tmemp − b) and soln2p = a−1 (−tmemp − b).
Perform sieving: Initialize a sieve array of length 2 m + 1 to zeros, with indices from −m to m. For each odd prime p in the factor base, add lp to the locations soln1p + i p for all integers i that satisfy −m ≤ soln1p + i p ≤ m, and likewise for soln2p. For the prime p = 2, sieve only with soln1p.
Trial division: Scan sieve array for locations x that have accumulated a value of a least log(m √n) minus a small error term. Trial divide ga, b(x) by the primes in the factor base. If ga, b(x) factors into primes less than f, then save smooth relation as indicated above. After scanning entire sieve array, if there are more smooth relations than primes in the factor base, then go to linear algebra step. Otherwise, go to initialization step to select a new polynomial.
Linear algebra: Solve the matrix as in Dixon’s method and the continued fraction method. For each null vector, construct relation of form x2 ≡ y2 (mod n) and attempt to factor n by computing gcd(x − y, n). If all null vectors fail to give a non-trivial factorization, go to initialization step to select a new polynomial.
Your task is to write a program to factor integers using the multiple polynomial quadratic sieve as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
June 18, 2013
Today’s exercise is a classic problem of computer science: given an array of positive and negative integers, find three that sum to zero, or indicate that no such triple exists. This is similar to the subset sum problem, which we solved in a previous exercise, but simpler because of the limit to three items. A brute force solution that operates in O(n3) time uses three nested loops to select items from the array and test their sum, but an O(n2) solution exists. For instance, in the array [8, -25, 4, 10, -10, -7, 2, -3], the numbers -10, 2 and 8 sum to zero, as do the numbers -7, -3, and 10.
Your task is to write a program that solves the 3SUM problem in O(n2) time. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
June 14, 2013
The ratio of the circumference to the diameter of a circle, known as π, is constant regardless of the size of the circle; that fact has been known to mathematicians for about five thousand years; much of the history of mathematics is intertwined in the history of πi, as approximations of the ratio have improved over time. Much of the history of this blog is also intertwined in the calculation of π; one of the original ten exercises used a bounded spigot algorithm of Jeremy Gibbons to calculate π, and later we used an unbounded spigot algorithm also due to Gibbons; we studied the ancient approximation of 355/113 calculated by Archimedes; we studied two different Monte Carlo simulations of π; and we even had the Brent-Salamin approximation contributed by a reader in a comment.
The development of computers allows us to compute the digits of π to an astonishing accuracy; the current record, unless somebody has bettered it recently, is ten trillion digits. That record was set by the Chudnovsky brothers, two Russian mathematicians living in New York, using an algorithm they developed in 1987. The algorithm is based on a definition of π developed by Ramanujan, and is beautifully described by the two brothers.
Your task is to compute many digits of π using the Chudnovsky algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
June 11, 2013
Today’s exercise comes from our unending list of interview questions:
Find the longest run of consecutive characters in a string that contains only two unique characters; if there is a tie, return the rightmost. For instance, given input string
abcabcabcbcbc, the longest run of two characters is the 6-character run of
bcbcbcthat ends the string.
June 7, 2013
Sets are ubiquitous in programming; we’ve used sets in several of our exercises. In most cases we made the sets from lists, which is good enough when the sets are small but quickly slows down when the sets get large. In today’s exercise we will write a generic library for sets.
The sets that we will consider are collections of items without duplicates. We will provide an adjoin function to add an item to a set if it is not already present and a delete function to remove an item from a set if it is present. A member function determines if a particular item is present in a set. The three set operations that are provided are intersection, union and difference; we will consider that the universe from which items are drawn is infinite, or at least too large to conveniently enumerate, so we will not provide a complement operation. For convenience, we will also provide functions to calculate the cardinality of a set and to create a list from the items present in the set.
Your task is to write a library for handling sets, based on the description given above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.