Sliding Median
June 29, 2012
We saw the streaming median problem in a recent exercise. Today we look at a slight variant on that problem, the sliding median, which slides a window of width k over an input stream, reporting the median of each successive window.
The streaming median required us to keep the entire input stream in memory. But for the sliding median, we keep only the most recent k items. We keep two data structures. A cyclical queue keeps the most recent input items, in the order in which they are input. An ordered map keeps the input items in sorted order. After reading the first k items, each time we read an item we output the current median, delete the oldest item in the cyclical queue from the ordered map, insert the new item into both the ordered map and the cyclical queue, and go on to the next item.
Your task is to write a program that implements the sliding median calculation. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Billboard Challenge, Part 2
June 26, 2012
In the previous exercise, we looked at a programming challenge that involved the digits of e. Solving that exercise and clicking on the indicated web site took the solver to this page:
Congratulations. You've made it to level 2. Go to www.Linux.org and enter Bobsyouruncle as the login and the answer to this equation as the password.
f(1)= 7182818284
f(2)= 8182845904
f(3)= 8747135266
f(4)= 7427466391
f(5)= __________
Don’t try to login; the account no longer exists.
Your task is to write a program to find the next number in the sequence. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Billboard Challenge, Part 1
June 22, 2012
Several years ago, a billboard with this text appeared in the tech centers of Silicon Valley and Route 128:
{ first 10-digit prime found in consecutive digits of e }.com
Your task is to write a program that finds the first 10-digit prime found in consecutive digits of e. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Digits Of E
June 19, 2012
We gave an algorithm for computing the digits of π in a previous exercise. Today, we look at two algorithms for computing the digits of e.
We begin with an algorithm due to Stanley Rabinowitz and Stan Wagon:
Algorithm e-spigot: compute the first n decimal digits of e:
1. Initialize: Let the first digit be 2 and initialize an array A of length n + 1 to (1, 1, 1, . . . , 1).
2. Repeat n − 1 times:
Multiply by 10: Multiply each entry of A by 10.
Take the fractional part: Starting from the right, reduce the ith entry of A modulo i + 1, carrying the quotient one place left.
Output the next digit: The final quotient is the next digit of e.
We give an example of the calculation for the first ten digits of e on the next page, where we compute the first ten digits of e as 2 7 1 8 2 8 1 8 2 6. As you can see, this algorithm suffers from the fact that the last digit may be wrong (it should be 8, not 6); in fact, as the paper suggests, there are circumstances where several of the last digits may be wrong. The algorithm also suffers from the fact that it is bounded, meaning that the number of digits must be specified in advance, and it needs space proportional to n2.
A different algorithm comes from Jeremy Gibbons, and is both unbounded and requires only constant space; Gibbons gave the sequence for π, but Tom Moertel adapts it to e. When I was unable to work out the algorithm from Moertel’s description, I asked Remco Niemeijer, a regular contributor to Programming Praxis, for help, and he responded with this gorgeous hunk of Haskell code, which computes both π and e:
stream :: Integral a => (a, a) -> (a, a, a) -> [(a, a, a)] -> [a]
stream (lo, hi) z ~(x:xs) = if lbound == approx z hi
then lbound : stream (lo, hi) (mul (10, -10*lbound, 1) z) (x:xs)
else stream (lo, hi) (mul z x) xs
where lbound = approx z lo
approx (a,b,c) n = div (a*n + b) c
mul (a,b,c) (d,e,f) = (a*d, a*e + b*f, c*f)
streamDigits :: (Num a, Integral a, Enum a) => (a -> (a, a, a)) -> (a, a) -> [a]
streamDigits f range = stream range (1,0,1) [(n, a*d, d) | (n,d,a) <- map f [1..]]
stream_pi, stream_e :: [Integer]
stream_pi = streamDigits (\k -> (k, 2*k + 1, 2)) (3, 4)
stream_e = streamDigits (\k -> (1, k , 1)) (1, 2)
main :: IO ()
main = do print $ take 30 stream_pi
print $ take 30 stream_e
Niemeijer explained that he had adapted the algorithm in Gibbons’ paper as follows:
–
unit
is a constant used in only one place. Remove the definition and just use the value directly.–
comp
is basic 2×2 matrix multiplication. Rename tomul
to make purpose clearer.–
extr
as defined in the paper didn’t compile for me: it should befromInteger (q * x)
instead offromInteger q * x
. But there’s more to be gained: the only placeextr
is used is preceded byfloor
, so essentially we havefloor (a / b)
, which is equal todiv a b
. Since it’s used to approximate the real value I named itapprox
.– The code used 2×2 matrices expressed as a 4-tuple. However, the bottom-left element is always 0. Remove this element everywhere, leaving 3-tuples and making
mul
andapprox
simpler.– For both π and e, the
next
andsafe
functions passed tostream
are identical save for the bounds used, so pass in the bounds and integrate the functions instream
.– The y used in
stream
is the lower bound. Rename accordingly.– The
cons
passed to stream is alwayscomp
(or in my casemul
). Remove the argument and usemul
directly.–
prod
is also the same for π and e, so remove the argument and inline the function.– The definition of π from the paper calls
stream
with some starting arguments. Since e will be doing the same, make a functionstreamDigits
to abstract this out.– Since
stream
now takes far fewer parameters, there’s no longer a reason to name them all.– According to the hulver site, each term in the π function is 2 + k/(2k + 1)x. For e, each term is 1 + (1/k)x (or rather, it defines e − 1 starting from k = 2, but if you start from k = 1 you add the term 1 + 1/1x, which gives you 1 + 1/1*(e-1), which of course is e). So both are of the form a + (n/d)x. The matrix used in
lfts
is (n, a*d, 0, d). I didn’t like having to do the multiplication myself, so I madestreamDigits
take a function that produces n, d and a for a given term to stay closer to the mathemetical definition. This function is used to produce the actual matrix.– Add
stream_e
andstream_pi
functions.
Your task is to write the two spigot functions for e described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Counting Ones
June 15, 2012
Today’s exercise is a fun math problem:
Consider a function f that takes a positive integer n and returns the number of 1s in the decimal representation of all the integers from 0 to n, inclusive. For example, f(13) = 6, for the numbers 1, 10, 11 (twice, for two 1s), 12 and 13. Notice that f(1) = 1. After 1, what is the next largest n for which f(n) = n?
Your task is to write a program that finds the solution. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Ordered Maps
June 12, 2012
In computer science, a map is a data type that associates a value with a key; all keys are unique, and the only comparison allowed is equality (are these two keys equal?). An ordered map extends the rule from equality to order (is one of these keys less than the other?), and if the ordered map also admits ranking the elements of the map (this item is first, that item is second, and so on), it is said to provide order statistics. A basic map is usually implemented with a hash table, and the two ordered variants are usually implemented with some kind of balanced tree (treap, avl, red-black). Maps are sometimes called dictionaries, though which type of map is intended is frequently unspecified, and the word dictionary can refer to either ordered or unordered maps.
Today’s exercise describes an abstract data type that provides an ordered map with order statistics. We provide the basic operations lookup, insert, and delete, and also an update function that either updates an existing value or inserts a new key/value pair in the map. For order statistics we provide the size, nth and rank operators. We also provide the higher-order functions map, fold and for-each, and conversions to and from lists. And finally, we provide an iterator to generate the key/value pairs in the map, one by one, in ascending order.
We use avl trees to implement our map; the code was given in two previous exercises. We also steal the generator code from a previous exercise. The new data type has recently been added to the Standard Prelude.
Your task is to implement an abstract data type of ordered maps with order statistics as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Cat
June 8, 2012
One of the on-going themes here at Programming Praxis involves the re-implementation of the text utilities from Unix V7. Today we look at the cat
command:
NAME
cat – catenate and print
SYNOPSIS
cat [ –u ] file …
DESCRIPTION
Cat reads each file in sequence and writes it on the standard output. Thus
cat file
prints the file and
cat file1 file2 >file3
concatenates the first two files and places the result on the third.
If no file is given, or if the argument ‘-‘ is encountered, cat reads from the standard input. Output is buffered in 512-byte blocks unless the standard output is a terminal or the –u option is present.
SEE ALSO
pr(1), cp(1)
BUGS
Beware of ‘cat a b >a’ and ‘cat a b >b’, which destroy input files before reading them.
Your task is to write the Unix V7 cat
program. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Binomial Heaps
June 5, 2012
A priority queue, or heap, is a data structure that accepts items in any order and emits them in sorted order. In previous exercises we have implemented priority queues using leftist heaps, pairing heaps, and maxiphobic heaps, and we have used them in sorting algorithms, a prime number generator, calculating the minimum spanning tree of a graph, apportioning representation in the U. S. House of Representatives, and the streaming median. In today’s exercise we look at yet another implementation of priority queues, using binomial heaps that we steal from Chris Okasaki’s book Purely Functional Data Structures; as an alternative to the book, you can read Okasaki’s paper at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.973.
Binomial heaps are lists of binomial trees, which consist of a rank, an element, and a list of child trees and are defined by a base rule and a recursive rule: A binomial tree of rank 0 is a singleton node. A binomial tree of rank r + 1 is made by linking two binomial trees of rank r by making one tree the left most child of the other. Each list of child nodes is maintained in decreasing order of rank; heap order is maintained by linking trees with larger elements under trees with smaller elements. No two sibling trees have the same rank. A consequence of the linking behavior is that all trees contain 2r elements.
Insertion is simple enough. First the item being inserted is made into a singleton heap of rank 0. Then the new heap is inserted by stepping through the existing trees in increasing order of rank, linking trees of equal rank along the way, until a missing rank is found. Merging two heaps is similar, stepping through both lists of trees, linking trees of equal rank. The worst case of insertion occurs with a heap of size 2k − 1, which requires O(k) = O(log n) time; a similar analysis applies to merging.
The find-min and remove-min operations both call an auxiliary function that finds the tree with the minimum element at its root and returns both that tree and the list of remaining trees. Find-min simply returns the minimum element. Delete-min discards the element at the root of the minimum tree, then merges its children with the list of remaining trees. Both the auxiliary function and the delete-min function operate in time O(k) = O(log n), and find-min operates in time O(1) once the auxiliary function runs.
Your task is to implement a function library for priority queues using binomial heaps. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Square Roots
June 1, 2012
In today’s exercise we look at four different algorithms for computing square roots of positive floating-point numbers.
The first method is bisection. For any x, the √x is between 1 and x. The bisection method repeatedly computes the square of the midpoint between the two and discards the half of the range in which the square root doesn’t appear; when the two endpoints are sufficiently close, their midpoint is returned as the result of the function.
In the first century, the Greek mathematician Hero of Alexandria (his name is often spelled Heron) devised an iterative method for calculating √x. If xk is a good approximation to √x, then the average of xk and x/xk is a better approximation; the accuracy doubles at each iterative step. The idea behind the algorithm is that √x is the geometric mean of 1 and x, but not knowing the geometric mean we use the arithmetic mean instead in a series of approximations.
The third method was invented by Sir Isaac Newton, and is again based on an iterative method. If xk is a good approximation to a function, then xk+1 = xk – f(xk) / f'(xk) is a better approximation. Knowing that 2x is the derivative of x2 makes it easy to compute the sequence of approximations.
The fourth method is an optimization that can be applied to either Heron’s or Newton’s methods. The input number x is reduced to the range [1,2) by successively multiplying or dividing by 2. Then, with the range so small, the loop is unrolled to a fixed number of iterations and the result divided or multiplied by the square root of 2 the same number of times as the original multiplication or division. This is fast because multiplying and dividing by 2 is easy for binary computers, and can be made even faster by table lookup of the starting point for the iteration, saving one or two steps.
Your task is to implement the four square root algorithms described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.