Two Bad Sorts

May 17, 2011

We have some fun today implementing two really bad sorting algorithms.

Stooge sort is a recursively-defined algorithm that sorts n items in O(nlog 3 / log 1.5) comparisons: If the value at the end of the list is less than the value at the start of the list swap them. Then, if there are three are more items in the list, recursively sort the first two-thirds of the items in the list, then the last two-thirds of the items in the list, and finally (again) the first two-thirds of the items in the list. This is so bad that it takes a few seconds just to convince yourself that it actually works.

Bogosort is particularly bad, with factorial time complexity, on average, and no guarantee that it ever terminates: collect the items in the list in random order. If they are sorted, return them, otherwise try again with a new random ordering of the items in the list.

Your task is to implement stooge sort and bogosort. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

In 1981, John D. Dixon, a mathematician at Carleton University, developed the integer factorization algorithm that bears his name. Dixon’s algorithm is not used in practice, because it is quite slow, but it is important in the realm of number theory because it is the only sub-exponential factoring algorithm with a deterministic (not conjectured) run time, and it is the precursor to the quadratic sieve factorization algorithm, which is eminently practical.

Dixon’s algorithm is based on the well-known fact of number theory that, given x2y2 (mod n), with x ≠ ±y (mod n), it is likely that gcd(xy, n) will be a factor of n. The algorithm is similar to the CFRAC algorithm of Morrison and Brillhart: the first phase finds smooth relations, and the second phase uses linear algebra to combine them and form a factorization. The difference is the first phase, where CFRAC looks in the convergents of the continued fraction of the square root of the number being factored, but Dixon’s algorithm simply chooses numbers at random.

Thus, the first stage of Dixon’s method is this: choose a bound B and identify the factor base of all primes less than B that are quadratic residues of n (their jacobi symbol is 1). Then choose a positive integer r < n, calculate its square modulo n, and if the square is smooth over the factor base, add r and its square to the linear algebra; repeat this step until you have found a sufficient number of smooth squares. The second stage is identical to CFRAC.

Your task is to write a function that factors integers by Dixon’s algorithm. When you are finished you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Comm

May 10, 2011

We have today another exercise in our continuing series of Unix V7 commands:

NAME

comm — select or reject lines common to two sorted files

SYNOPSIS

comm [ -[123] ] file1 file2

DESCRIPTION

Comm reads file1 and file2, which should be ordered in ASCII collating sequence, and produces a three column output: lines only in file1; lines only in file2; and lines in both files. The filename ‘-‘ means the standard input. Flags 1, 2, or 3 suppress printing of the corresponding column. Thus comm -12 prints only the lines common to the two files; comm -23 prints only lines in the first file but not in the second; comm -123 is a no-op.

Your task is to implement the comm command. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Entab And Detab

May 6, 2011

In ancient times, say the 1970s, religious wars were fought about whether to indent blocks of code using tabs or spaces, and how wide the indents should be. Then editing environments got better and programmers stopped arguing about tabs and started arguing about other things, like where the braces go. Now, with the advent of the internet, the problem of tab/space indentation seems to be returning, as copy/paste operations between web browsers and text editors seem to get the indentation wrong (especially when the source is the evil PDF format).

In ancient times, the normal solution to the tab/space indentation problem was a pair of programs, entab and detab, that could easily convert between the two formats. Now, with the internet, it behooves us to resurrect those old programs.

Your task is to write programs that convert files using tabs for indentation to files using spaces for indentation, and vice versa; be sure to permit an argument specifying the width of the tab. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Squaring The Bishop

May 3, 2011

Among the many other works of Charles Babbage is the game of word squares. A word square consists of a set of words written in a square grid in such a way that the same words can be read both horizontally and vertically. Some sample word squares are shown below; the one on the left, in Latin, was found in the ruins of Pompeii, the one in the middle is due to Doug McIlroy, and the one on the right is due to Babbage:

S A T O R    W A S S A I L    D E A N
A R E P O    A N T E N N A    E A S E
T E N E T    S T R I N G Y    A S K S
O P E R A    S E I Z U R E    N E S T
R O T A S    A N N U L A R
             I N G R A T E
             L A Y E R E D

Babbage, in his memoirs, proposed to find a word square based on BISHOP, but was unable to do so.

Your task is to write a program that, given a single word, creates a word square using that word in its top row, then use that program to find a word square for the starting word BISHOP. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Rule 30 RNG

April 29, 2011

We have examined several different random number generators in past exercises, including Donald Knuth’s lagged-fibonacci generator that is used in the Standard Prelude. We also looked at cellular automata in a previous exercise. In today’s exercise we combine random number generators and cellular automata by looking at a random number generator developed by Stephen Wolfram, based on the Rule 30 cellular automaton of his book A New Kind of Science. Our random number generator will be similar to that in Mathematica; it is not cryptographically secure, but is suitable for simulation, as long as you avoid the occasional bad seed, like 0.

The cellular automata we are discussing have a state consisting of a row of cells; each cell can be in either of two states, 0 or 1. Unlike the cellular automata of the previous exercise, the row contains a finite number of cells and is considered to “wrap around” at the ends. A new state is generated based on the current state by assigning to each cell in the new state a value determined by the same-indexed cell in the previous state as well as the two cells immediately adjacent to it. The chart below shows the rule that determines the cell value in the new state:

  ■ ■ ■     ■ ■ □     ■ □ ■     ■ □ □     □ ■ ■     □ ■ □     □ □ ■     □ □ □  

The name Rule 30 comes from the binary-to-decimal conversion of the new values in each of the cells. Taking the same-indexed cell in each successive state gives a sequence of random bits; collect enough of them and you can convert them to a number.

In Wolfram’s book, the various cellular automata are studied based on an infinite row that starts with a single 1-cell, with all remaining cells having a value of 0; the successive states of that cellular automaton are shown in the image above-right. In a random-number generator, the initial state is seeded with 1s and 0s in some user-defined pattern.

Your task is to write a random-number generator based on the Rule 30 cellular automaton. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Miscellanea

April 26, 2011

We have three short exercises today.

FizzBuzz: Looking back over past exercises, I was surprised to find that we haven’t done this classic interview question. You are to write a function that displays the numbers from 1 to an input parameter n, one per line, except that if the current number is divisible by 3 the function should write “Fizz” instead of the number, if the current number is divisible by 5 the function should write “Buzz” instead of the number, and if the current number is divisible by both 3 and 5 the function should write “FizzBuzz” instead of the number. For instance, if n is 20, the program should write 1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, FizzBuzz, 16, 17, Fizz, 19, and Buzz on twenty successive lines.

Prime Words: Consider that a word consisting of digits and the letters A through Z can represent an integer in base 36, where the digits represent their base-10 counterparts, A is a decimal 10, B is a decimal 11, and so on, until Z is a decimal 35. For instance, PRAXIS36 = P36 × 365 + R36 × 364 + A36 × 363 + X36 × 362 + I36 × 361 + S36 × 360 = 25 × 365 + 27 × 364 + 10 × 363 + 33 × 362 + 18 × 361 + 28 × 360 = 25 × 60466176 + 27 × 1679616 + 10 × 46656 + 33 × 1296 + 18 × 36 + 28 × 1 = 1557514036. You are to write a function that takes a base-36 number as input and returns true if the number is prime and false if the number is composite.

Split A List: You are to write a function that takes an input list and returns two lists, the first half of the input list and the second half of the input list. If the input list has an odd number of elements, it is your choice in which half to place the center element. You are only permitted to scan the list once.

Your task is to write the three functions described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Xref

April 22, 2011

In the old days, when programs were written on punch cards or paper tape and editors didn’t provide regular-expression searches, most compilers provided a cross-reference utility that showed each use of each identifier in the program. You don’t often see cross-referencers any more, and that’s too bad, because a cross-referencer can provide good clues about the structure of a program; looking at a cross-reference listing can be a good way to start examining an unfamiliar program. Here’s some sample output from a Scheme cross-referencer, with the identifier as the first word on each line followed by a list of line numbers where the identifier appears:

0 28
1 18
< 27
= 30
?!.+-*/<=>:$%^&_~@ 5
a 24 25 26 27
add1 20
and 26 30
b 24 25 26 27
c 3 4 5 8 9 10 13
caar 30 32 36 37
car 25 26
cdar 30 33 34 36 37
cdr 27 31 34 37
char-alphabetic? 4
char-in-ident? 3 10
char-numeric? 4
char=? 13
cond 9 19 29
cons 11 21
cs 8 9 11 12 14
define 3 7 16 23 24
display 33 36
else 14 21 35
eof-object? 9 19
file 16 17
get-ident 7 18 20 21
identifiers 3
if 9
lambda 17
let 8 11 18 28
line 18 20 21
list->string 9 12
loop 8 11 14 18 20 21 28 31 34 37
lt? 24 28
member 5
newline 13 29 35
not 35
null? 9 29
or 4 25
pair? 12
peek-char 8 11 14
prev-line 28 30 31
prev-word 28 30 31 32 34 35
read-char 11 13 14
reverse 9 12
scheme 3
sort 28
string->list 5
string<? 25
string=? 20 26 30 32 35
w 18 19 20 21
when 35
with-input-from-file 17
ws 18 19 20 21 23 28 29 30 31 32 33 34 36 37
x 11
xref 1 16
xref-out 19 23

A quick look suggests that this program reads a text file (with-input-from-file, read-char, eof-object?) and does some kind of processing on its characters (char-alphabetic?, char-numeric?). There is some list-handling (car, cdr, string->list), loop is used conventionally, and ws seems to be some kind of key to understanding what the program does. And there is an unholy mess on line 5.

Your task is to write a cross-referencer for your chosen language. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Same Five Digits

April 19, 2011

[Today’s exercise was written by guest author Bob Miller. Bob has been writing system software for Unix since the VAX was new and shiny, and his current hobby is writing Scheme interpreters. Suggestions for exercises are always welcome, or you may wish to contribute your own exercise; feel free to contact me if you are interested.]

Enigma is a weekly column in New Scientist. Every week it has a new puzzle. Some of the Enigma puzzles could be solved using a computer.

A recent puzzle, Enigma Number 1638, is in that category:

I have written down three different 5-digit perfect squares, which between them use five different digits. Each of the five digits is used a different number of times, the five numbers of times being the same as the five digits of the perfect squares. No digit is used its own number of times. If you knew which digit I have used just once you could deduce my three squares with certainty.

What are my three perfect squares?

Your task is to write a program that finds and prints the three perfect squares. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Partition Numbers

April 15, 2011

The partition-number function P(n) gives the number of ways of writing n as the sum of positive integers, irrespective of the order of the addends. For instance P(4) = 5, since 4 = 4 = 3 + 1 = 2 + 2 = 2 + 1 + 1 = 1 + 1 + 1 + 1. Sloane’s A000041 gives the first ten partition numbers as 1, 2, 3, 5, 7, 11, 15, 22, 30, and 42; the numerous references to that sequence point to many fascinating facts about partition numbers, including their close association with pentagonal numbers. By convention, P(0) = 1 and P(n) = 0 for negative n. Partition numbers are normally calculated by the formula, which was discovered by Leonhard Euler:

P(n) = \sum_{k=1}^n (-1)^{k+1} \left[ P\left( n - \frac{k\left(3k-1\right)}{2} \right) + P\left( n - \frac{k\left(3k+1\right)}{2} \right) \right]

Your task is to write a function that computes P(n), and to calculate P(1000). When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2