Sliding Median

June 29, 2012

We saw the streaming median problem in a recent exercise. Today we look at a slight variant on that problem, the sliding median, which slides a window of width k over an input stream, reporting the median of each successive window.

The streaming median required us to keep the entire input stream in memory. But for the sliding median, we keep only the most recent k items. We keep two data structures. A cyclical queue keeps the most recent input items, in the order in which they are input. An ordered map keeps the input items in sorted order. After reading the first k items, each time we read an item we output the current median, delete the oldest item in the cyclical queue from the ordered map, insert the new item into both the ordered map and the cyclical queue, and go on to the next item.

Your task is to write a program that implements the sliding median calculation. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisement

Pages: 1 2

In the previous exercise, we looked at a programming challenge that involved the digits of e. Solving that exercise and clicking on the indicated web site took the solver to this page:

Congratulations. You've made it to level 2. Go to www.Linux.org and enter Bobsyouruncle as the login and the answer to this equation as the password.

f(1)= 7182818284
f(2)= 8182845904
f(3)= 8747135266
f(4)= 7427466391
f(5)= __________

Don’t try to login; the account no longer exists.

Your task is to write a program to find the next number in the sequence. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Several years ago, a billboard with this text appeared in the tech centers of Silicon Valley and Route 128:

{ first 10-digit prime found in consecutive digits of e }.com

Your task is to write a program that finds the first 10-digit prime found in consecutive digits of e. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Digits Of E

June 19, 2012

We gave an algorithm for computing the digits of π in a previous exercise. Today, we look at two algorithms for computing the digits of e.

We begin with an algorithm due to Stanley Rabinowitz and Stan Wagon:

Algorithm e-spigot: compute the first n decimal digits of e:

1. Initialize: Let the first digit be 2 and initialize an array A of length n + 1 to (1, 1, 1, . . . , 1).

2. Repeat n − 1 times:

Multiply by 10: Multiply each entry of A by 10.

Take the fractional part: Starting from the right, reduce the ith entry of A modulo i + 1, carrying the quotient one place left.

Output the next digit: The final quotient is the next digit of e.

We give an example of the calculation for the first ten digits of e on the next page, where we compute the first ten digits of e as 2 7 1 8 2 8 1 8 2 6. As you can see, this algorithm suffers from the fact that the last digit may be wrong (it should be 8, not 6); in fact, as the paper suggests, there are circumstances where several of the last digits may be wrong. The algorithm also suffers from the fact that it is bounded, meaning that the number of digits must be specified in advance, and it needs space proportional to n2.

A different algorithm comes from Jeremy Gibbons, and is both unbounded and requires only constant space; Gibbons gave the sequence for π, but Tom Moertel adapts it to e. When I was unable to work out the algorithm from Moertel’s description, I asked Remco Niemeijer, a regular contributor to Programming Praxis, for help, and he responded with this gorgeous hunk of Haskell code, which computes both π and e:

stream :: Integral a => (a, a) -> (a, a, a) -> [(a, a, a)] -> [a]
stream (lo, hi) z ~(x:xs) = if lbound == approx z hi
    then lbound : stream (lo, hi) (mul (10, -10*lbound, 1) z) (x:xs)
    else stream (lo, hi) (mul z x) xs
    where lbound = approx z lo
          approx (a,b,c) n = div (a*n + b) c
          mul (a,b,c) (d,e,f) = (a*d, a*e + b*f, c*f)

streamDigits :: (Num a, Integral a, Enum a) => (a -> (a, a, a)) -> (a, a) -> [a]
streamDigits f range = stream range (1,0,1) [(n, a*d, d) | (n,d,a) <- map f [1..]]

stream_pi, stream_e :: [Integer]
stream_pi = streamDigits (\k -> (k, 2*k + 1, 2)) (3, 4)
stream_e  = streamDigits (\k -> (1, k      , 1)) (1, 2)

main :: IO ()
main = do print $ take 30 stream_pi
          print $ take 30 stream_e

Niemeijer explained that he had adapted the algorithm in Gibbons’ paper as follows:

unit is a constant used in only one place. Remove the definition and just use the value directly.

comp is basic 2×2 matrix multiplication. Rename to mul to make purpose clearer.

extr as defined in the paper didn’t compile for me: it should be fromInteger (q * x) instead of fromInteger q * x. But there’s more to be gained: the only place extr is used is preceded by floor, so essentially we have floor (a / b), which is equal to div a b. Since it’s used to approximate the real value I named it approx.

– The code used 2×2 matrices expressed as a 4-tuple. However, the bottom-left element is always 0. Remove this element everywhere, leaving 3-tuples and making mul and approx simpler.

– For both π and e, the next and safe functions passed to stream are identical save for the bounds used, so pass in the bounds and integrate the functions in stream.

– The y used in stream is the lower bound. Rename accordingly.

– The cons passed to stream is always comp (or in my case mul). Remove the argument and use mul directly.

prod is also the same for π and e, so remove the argument and inline the function.

– The definition of π from the paper calls stream with some starting arguments. Since e will be doing the same, make a function streamDigits to abstract this out.

– Since stream now takes far fewer parameters, there’s no longer a reason to name them all.

– According to the hulver site, each term in the π function is 2 + k/(2k + 1)x. For e, each term is 1 + (1/k)x (or rather, it defines e − 1 starting from k = 2, but if you start from k = 1 you add the term 1 + 1/1x, which gives you 1 + 1/1*(e-1), which of course is e). So both are of the form a + (n/d)x. The matrix used in lfts is (n, a*d, 0, d). I didn’t like having to do the multiplication myself, so I made streamDigits take a function that produces n, d and a for a given term to stay closer to the mathemetical definition. This function is used to produce the actual matrix.

– Add stream_e and stream_pi functions.

Your task is to write the two spigot functions for e described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3

Counting Ones

June 15, 2012

Today’s exercise is a fun math problem:

Consider a function f that takes a positive integer n and returns the number of 1s in the decimal representation of all the integers from 0 to n, inclusive. For example, f(13) = 6, for the numbers 1, 10, 11 (twice, for two 1s), 12 and 13. Notice that f(1) = 1. After 1, what is the next largest n for which f(n) = n?

Your task is to write a program that finds the solution. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Ordered Maps

June 12, 2012

In computer science, a map is a data type that associates a value with a key; all keys are unique, and the only comparison allowed is equality (are these two keys equal?). An ordered map extends the rule from equality to order (is one of these keys less than the other?), and if the ordered map also admits ranking the elements of the map (this item is first, that item is second, and so on), it is said to provide order statistics. A basic map is usually implemented with a hash table, and the two ordered variants are usually implemented with some kind of balanced tree (treap, avl, red-black). Maps are sometimes called dictionaries, though which type of map is intended is frequently unspecified, and the word dictionary can refer to either ordered or unordered maps.

Today’s exercise describes an abstract data type that provides an ordered map with order statistics. We provide the basic operations lookup, insert, and delete, and also an update function that either updates an existing value or inserts a new key/value pair in the map. For order statistics we provide the size, nth and rank operators. We also provide the higher-order functions map, fold and for-each, and conversions to and from lists. And finally, we provide an iterator to generate the key/value pairs in the map, one by one, in ascending order.

We use avl trees to implement our map; the code was given in two previous exercises. We also steal the generator code from a previous exercise. The new data type has recently been added to the Standard Prelude.

Your task is to implement an abstract data type of ordered maps with order statistics as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Cat

June 8, 2012

One of the on-going themes here at Programming Praxis involves the re-implementation of the text utilities from Unix V7. Today we look at the cat command:

NAME

cat – catenate and print

SYNOPSIS

cat [ –u ] file …

DESCRIPTION

Cat reads each file in sequence and writes it on the standard output. Thus

cat file

prints the file and

cat file1 file2 >file3

concatenates the first two files and places the result on the third.

If no file is given, or if the argument ‘-‘ is encountered, cat reads from the standard input. Output is buffered in 512-byte blocks unless the standard output is a terminal or the –u option is present.

SEE ALSO

pr(1), cp(1)

BUGS

Beware of ‘cat a b >a’ and ‘cat a b >b’, which destroy input files before reading them.

Your task is to write the Unix V7 cat program. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Binomial Heaps

June 5, 2012

A priority queue, or heap, is a data structure that accepts items in any order and emits them in sorted order. In previous exercises we have implemented priority queues using leftist heaps, pairing heaps, and maxiphobic heaps, and we have used them in sorting algorithms, a prime number generator, calculating the minimum spanning tree of a graph, apportioning representation in the U. S. House of Representatives, and the streaming median. In today’s exercise we look at yet another implementation of priority queues, using binomial heaps that we steal from Chris Okasaki’s book Purely Functional Data Structures; as an alternative to the book, you can read Okasaki’s paper at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.973.

Binomial heaps are lists of binomial trees, which consist of a rank, an element, and a list of child trees and are defined by a base rule and a recursive rule: A binomial tree of rank 0 is a singleton node. A binomial tree of rank r + 1 is made by linking two binomial trees of rank r by making one tree the left most child of the other. Each list of child nodes is maintained in decreasing order of rank; heap order is maintained by linking trees with larger elements under trees with smaller elements. No two sibling trees have the same rank. A consequence of the linking behavior is that all trees contain 2r elements.

Insertion is simple enough. First the item being inserted is made into a singleton heap of rank 0. Then the new heap is inserted by stepping through the existing trees in increasing order of rank, linking trees of equal rank along the way, until a missing rank is found. Merging two heaps is similar, stepping through both lists of trees, linking trees of equal rank. The worst case of insertion occurs with a heap of size 2k − 1, which requires O(k) = O(log n) time; a similar analysis applies to merging.

The find-min and remove-min operations both call an auxiliary function that finds the tree with the minimum element at its root and returns both that tree and the list of remaining trees. Find-min simply returns the minimum element. Delete-min discards the element at the root of the minimum tree, then merges its children with the list of remaining trees. Both the auxiliary function and the delete-min function operate in time O(k) = O(log n), and find-min operates in time O(1) once the auxiliary function runs.

Your task is to implement a function library for priority queues using binomial heaps. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Square Roots

June 1, 2012

In today’s exercise we look at four different algorithms for computing square roots of positive floating-point numbers.

The first method is bisection. For any x, the √x is between 1 and x. The bisection method repeatedly computes the square of the midpoint between the two and discards the half of the range in which the square root doesn’t appear; when the two endpoints are sufficiently close, their midpoint is returned as the result of the function.

In the first century, the Greek mathematician Hero of Alexandria (his name is often spelled Heron) devised an iterative method for calculating √x. If xk is a good approximation to √x, then the average of xk and x/xk is a better approximation; the accuracy doubles at each iterative step. The idea behind the algorithm is that √x is the geometric mean of 1 and x, but not knowing the geometric mean we use the arithmetic mean instead in a series of approximations.

The third method was invented by Sir Isaac Newton, and is again based on an iterative method. If xk is a good approximation to a function, then xk+1 = xkf(xk) / f'(xk) is a better approximation. Knowing that 2x is the derivative of x2 makes it easy to compute the sequence of approximations.

The fourth method is an optimization that can be applied to either Heron’s or Newton’s methods. The input number x is reduced to the range [1,2) by successively multiplying or dividing by 2. Then, with the range so small, the loop is unrolled to a fixed number of iterations and the result divided or multiplied by the square root of 2 the same number of times as the original multiplication or division. This is fast because multiplying and dividing by 2 is easy for binary computers, and can be made even faster by table lookup of the starting point for the iteration, saving one or two steps.

Your task is to implement the four square root algorithms described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2