August 31, 2012
Sometimes the phases of the moon fall on the calendar in such a way that one calendar month has two full moons; the second full moon of the month is called a blue moon, for reasons that have been lost in antiquity. Poets and songwriters sometimes use the phrase “once in a blue moon” to indicate an event that occurs infrequently, but in fact blue moons occur every two or three years, on average. Today’s full moon is the blue moon of August 2012.
We looked at the phases of the moon in a previous exercise. There we learned that new moons occur every 29.530588853 days, that a new moon occurred on January 6, 2000 (julian date 2451550.1), and that full moons occur halfway between two new moons.
Your task is to write a program that calculates all the blue moons of the twenty-first century. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
August 28, 2012
Lists are ubiquitous in programming, especially in languages like Scheme, but elsewhere as well. Their primary advantage over arrays is that they can easily grow as needed, but that brings a corresponding disadvantage: it takes O(n) time to access the nth item in a list, but only O(1) time to access the nth item an an array.
Chris Okasaki has invented a remarkably clever data structure that provides the normal O(1) time complexity for the cons, head and tail operators of lists but reduces random access to the nth item in a list to O(log n), which means lists can sometimes be used in place of arrays, especially when it is inconvenient to determine the size of the array in advance or when the items of the array are normally accessed in sequence.
I won’t try to explain Okasaki’s data structure here; you can look at http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.5156 for the details, or look at Figure 9.7 in Okasaki’s book Purely Functional Data Structures, as I did.
Your task is to implement a library for random access lists, including the functions cons, head, tail, lookup and update. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
August 24, 2012
One of the classic data structures of computer science is the hash table, which provides constant-time access to key/value data in the average case. The idea is to store a key/value pair in a location that is computed based on the value of the key, which works fine when all the hash values are unique; the problem arises when two keys hash to the same value. The hash tables of the Standard Prelude use a method called chaining to resolve collisions; today’s exercise uses a different method called open addressing.
All hash tables work by computing an address at which to store an item based on its key. Sometimes two keys hash to the same value, which causes a collision. Chaining, as used in the Standard Prelude, resolves collisions by storing multiple items at the same location, forming a list of items. Open addressing instead computes a second address, or if necessary a third, or fourth, or …, continuing until it finds an empty spot. It is necessary that the computation of secondary addresses eventually visits every possible storage location; a simple approach, which we will adopt, is called linear probing, in which storage locations are accessed in increasing order until an empty location is found. It is possible, of course, for the hash table to become completely filled, in which case an error is reported when a new item cannot be inserted.
The tricky part of hashing with open addressing is deletions, because it’s not possible to simply delete an item because some other item may rely on the the fact that its storage location was filled when the item was inserted. The solution is to have three types of items in a storage location: nil, which indicates that the storage location has never been used; deleted, which indicates that the storage location is currently empty but has been used in the past, and in use, for those storage locations that are currently occupied.
Your task is to write functions that maintain a hash table with open addressing. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
August 21, 2012
We did two tasks related to random numbers in the most recent exercise, and we have looked at high-quality random number generators in several previous exercises. In today’s exercise we look at two very low-quality random number generators, which should not be used for any production application.
The first, invented by John von Neumann in 1949, occasioned his famous quip “Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin.” The middle-square method takes a number with an even number of digits, squares it, and extracts the middle digits for the next iteration; for instance, if the seed is 675248, the square is 455959861504, and the middle digits are 959861.
The second, invented by IBM in the early 1960s, caused Donald Knuth to claim “its very name RANDU is enough to bring dismay into the eyes and stomachs of many computer scientists!”. RANDU is based on the recursion xn+1 = 65539 · xn (mod 231), with x0 odd.
Your task is to write functions that generate random numbers by the middle-square and RANDU methods. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
August 17, 2012
We have two exercises related to random numbers today. I’m not sure of the source, but they look to me like homework exercises.
First, given a function rand3 that returns a number from 1 to 3 inclusive chosen at random, write a function that returns a number from 1 to 9, inclusive.
Second, given a function rand5 that returns a number from 1 to 5 inclusive chosen at random, write a function that returns a number from 1 to 7, inclusive.
In both cases all possible output numbers should be generated with equal frequency. You should demonstrate that your functions behave properly.
Your task is to write the rand9 and rand7 functions and demonstrate that they work properly. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
August 14, 2012
We have today another exercise from our inexhaustible stock of interview questions:
Given an array of integers, output a list of four integers that sum to zero (the same input integer can be used multiple times), or indicate that no such set of four integers exists. For example, given the array (2 3 1 0 -4 -1), the set of four integers (3 1 0 -4) sums to zero, as does the set (0 0 0 0).
Your task is to write a program that solves the interview question. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
August 10, 2012
Today’s exercise comes to us from the practice round of Google Code Jam 2008.
You are given two vectors v1=(x1,x2,…,xn) and v2=(y1,y2,…,yn). The scalar product of these vectors is a single number, calculated as x1y1+x2y2+…+xnyn.
Suppose you are allowed to permute the coordinates of each vector as you wish. Choose two permutations such that the scalar product of your two new vectors is the smallest possible, and output that minimum scalar product.
Google gives two examples: the minimum scalar product of the two vectors (1 3 -5) and (-2 4 1) is -25, and the minimum scalar product of the two vectors (1 2 3 4 5) and (1 0 1 0 1) is 6.
Your task is to write a program that finds the minimum scalar product of two vectors. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
August 7, 2012
We continue today our occasional series based on classic Unix utilites. The
make command maintains a set of file dependencies, given a definition of those files which depend on others and the commands used to update the predecessors.
Make is commonly used to automate program compilation, but it also finds use in other ways, such as maintaining the dependencies between the files that make up a complicated document or website. Here is a typical makefile for a small C program:
prog: a.o b.o c.o
cc a.o b.o c.o -ly -o prog
a.o: prog.h a.c
cc -c prog.h a.c
b.o: prog.h b.c
cc -c prog.h b.c
cc -c c.c
mv y.tab.c c.c
pr prog.h a.c b.c c.y
The first line says that the target prog depends on files a.o, b.o and c.o and is generated by calling the C compiler to link a.o, b.o, c.o and the y library into the executable file prog. The value of
make is that it eliminates needless work; if everything is up to date and a change is made to the yacc grammar in c.y, the only commands that need to be run are the yacc and mv commands to rebuild the c.y file, the cc command that rebuilds c.c, and the cc command that rebuilds prog:
mv y.tab.c c.c
cc -c c.c
cc a.o b.o c.o -ly -o prog
make program begins by reading the makefile and storing the dependencies and the associated commands. Then it takes the target, checks the filesystem to determine the ages of the target’s predecessors, calls itself recursively to update any older predecessors, and finally calls the commands associated with the target.
Your task is to write a program that takes a target and updates it according to the rules of the makefile. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
August 3, 2012
In the previous exercise we looked at two slow solutions to the SEND + MORE = MONEY cryptarithm. In today’s exercise we look at two more solutions.
Our third solution uses a hill-climbing algorithm. The basic idea is to start with a random solution, score it, then alter it, score the modified solution, keep it if it has a better score than the original, and repeat until the desired solution is found. For the cryptarithm problem, the alteration can be done by swapping the values assigned to two letters chosen randomly, and scoring can be done by computing the difference between SEND + MORE and MONEY; the solution is found when the difference is zero.
The problem with hill-climbing is that it can get stuck at a local optimum with no hope of achieving a global optimum. Consider the correct solution to the SEND + MORE = MONEY problem; we give the solution in a list, with O=0, M=1, and so on, and no letter assigned to 3 or 4: (o m y _ _ e n d r s). It is possible (it happened to me when I was writing the program) for hill-climbing to reach the solution (o m y _ e n _ d r s) with a score of 1. It takes two swaps to find the correct solution, but there is only one possible improvement in the score, from 1 to 0, so if a random hill-climb ever reaches the incorrect solution shown above, it will loop forever without reaching the correct solution.
Thus, our fourth solution is a variant of hill-climbing that adds additional randomization: a modified solution is always accepted if it has a better score than the original, and it is also accepted sometimes even if it has a worse score than the original, say about once in a hundred times. That way, if the hill-climbing reaches a local optimum, it has a way to “jump” to a different hill and continue to the global optimum.
The straight hill-climbing algorithm is fast when it works, taking half a second or less (depending on the randomization). The variant hill-climbing climbing algorithm always works, and is equally fast.
Your task is to write the two cryptarithm algorithms given above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.