## Surfraw

### November 6, 2020

For the last few days I’ve been playing with `surfraw`

, a command-line program that allows easy access to a variety of search engines. About a hundred `elvi`

(search agents) are provided, and it is easy to add your own. You can get `surfraw`

from your favorite code repository (for me, it was `sudo apt install surfraw`

). I use lynx for the browser, so everything works on the command line.

Your task is to install `surfraw`

on your machine, write a custom `elvi`

, and share it with the rest of us. When you are finished, you are welcome to read a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## List Slices

### October 27, 2020

During some recent personal programming, I needed a function to slice a list into pieces: Given a list of sub-list lengths, and an input list to be sliced, return a list of sub-lists of the requested lengths. For instance:, slicing the list (1 2 2 3 3 3) into sub-lists of lengths (1 2 3) returns the list of sub-lists ((1) (2 2) (3 3 3)). Extra list elements at the end of the input list are ignored, missing list elements at the end of the input list are returned as null lists.

Your task is to write a program that slices lists into sub-lists according to a specification of sub-list lengths. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Doubled Pairs

### October 20, 2020

Today’s exercise is somebody’s homework problem:

You are given an array containing positive integers, all distinct. Write a program that determines if there exist in the array any numbers such that one is double the other. For instance, in the array [1,2,3,4], the pairs 1,2 and 2,4 are both doubles, so the program should return True; in the array [1,3,5,7,9] there is no such doubled pair, so the program should return False. For extra credit, return a pair that proves the result is True, if it exists. For extra-extra credit, return all such pairs. Your program must run in linear time.

The student who asked for help realized there was a simple nested-loop solution that runs in quadratic time, but that doesn’t meet the constraints of the homework problem.

Your task is to solve all three levels of the homework problem. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Pandigital Squares, Faster And Smaller

### October 6, 2020

In a previous exercise, we wrote a program to compute pandigital squares, defined as ten-digit numbers with integral square roots in which each digit zero through nine appears exactly once. We remarked at the time that our solution, though not very fast, was fast enough.

Your task is to write a program that finds pandigital squares, reducing the time and space requirements of the naive solution. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Non-Breaking Hyphen

### October 2, 2020

Today’s exercise tells the story of a problem I faced in my day job. A user apparently copy/pasted a field from Word, Excel or Outlook into our data processing system, which assumes plain ascii (the underlying database, Oracle, handles unicode properly, but the system on top of Oracle doesn’t). Unfortunately, although the field looked okay,

10001366650-1

the dash was actually a non-breaking hyphen, unicode 2011_{10}, which broke the system in a rather large way. The field in question was the vendor invoice number in an accounts payable system, and when the check paying that invoice was written, the check-writing program dropped the remittance advice from that check, so every subsequent check had the wrong remittance advice attached. The error wasn’t discovered immediately, so some of the checks were already mailed, making recovery difficult (we couldn’t just void the check run and start over because some of the checks were already mailed, and restarting the check run meant the check numbers wouldn’t match). So it was a grand old mess. In case you’re curious, I demonstrated the error with this SQL statement

select asciistr(fabinvh_vend_inv_code) from fabinvh where fabinvh_code = 'I2104519'

which returned

1000136650\20111

Unicode is nearly thirty years old. Users have the right to expect that their systems handle unicode characters properly.

Your task is to write a program that detects unicode/ascii error; you might also tell us about any unicode horror stories you have faced. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Pandigital Squares

### September 29, 2020

A pandigital number, like 4730825961, is a ten-digit number in which each digit from zero to nine appears exactly once. A square number, like 25² = 625, is a number with an integral square root.

Your task is to write a program that finds all of the pandigital square numbers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## 8424432925592889329288197322308900672459420460792433

### September 1, 2020

Regular readers know of my affinity for number theory. Today’s exercise is a reflection of that.

It is conjectured that the two numbers produced by the equations

n^{17}+ 9 and (n+ 1)^{17}+ 9 for a givennare always co-prime (that is, their greatest common divisor is 1). Is that conjecture correct?

Your task is to either prove that the greatest common divisor is always 1 or write a program that finds a case where the greatest common divisor is not 1. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Approximate Median

### August 21, 2020

[ I offer my apologies to my readers for my recent absence. My employer, a local community college, is struggling with this virus business, revising nearly all of its business practices, and my programmer colleagues and I have been very busy. The Fall semester starts next week (mostly on-line classes, some on-campus classes for science labs and the nursing students), so hopefully things on the virus front will get better soon. But we are also in the middle of changing our main computing system from running on HP-UX on fifteen-year old hardware to Linux on new hardware, and having all kinds of setup problems (all of the people who set up the current system twenty years ago are gone, and no one seems to know how to set up the new system), so maybe not too soon. I hope all is well with all of you. — Phil ]

We have previously studied algorithms for the streaming median and sliding median that calculate the median of a stream of numbers; the streaming median requires storage of all the numbers previously seen, and the sliding median requires storage of the last *k* numbers in the stream, for some *k*.

Today’s exercise estimates the median of a stream of numbers while storing only two numbers:

The idea is at each iteration the median inches toward the input signal at a constant rate. The rate depends on what magnitude you estimate the median to be. I use the average as an estimate of the magnitude of the median, to determines the size of each increment of the median. If you need your median accurate to about 1%, use a step-size of 0.01 * the average.

Your task is to write a program that estimates the streaming median according to the given algorithm. When you are finished, you are welcome to read or run the suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Loglog

### August 4, 2020

There are 19,055 distinct words in the Bible:

$ cat bible.txt | tr -cs A-Za-z ‘

‘ | sort -u | wc -w

19055

It’s easy enough to count the number of distinct items in a set (its “cardinality”) when the set is small, but when the set is large, the intermediate storage required for the distinct items can be overwhelming.

Phillipe Flajolet and various co-authors wrote a series of papers in which they developed methods of estimating the cardinality of a set with only a small amount of auxiliary storage, using randomization; Flajolet’s algorithms can be seen as an improvement on Robert Morris’ counting algorithm that we studied in a previous exercise. We will study Flajolet’s loglog algorithm in today’s exercise and perhaps have a look at his other algorithms in future exercises.

The basic idea is to apply a hash function to each element of the set. The first bit of the hash value will be zero about half the time, the first two bits of the hash value will be zero about a quarter of the time, the first three bits of the hash value will be zero about an eighth of the time, and so on; by looking at the maximum number of leading zero-bits, we can estimate the cardinality of the set. Flajolet extends this algorithm by splitting the counts among 2^{k} buckets and averaging the estimated cardinalities; the bucket is selected randomly by looking at the last *k* bits of the hash value.

Your task is to implement Flajolet’s loglog algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Lost Boarding Pass

### July 21, 2020

We have today a fun little problem from probability:

On a sold-out flight, 100 people line up to board the plane. The first passenger in the line has lost his boarding pass but was allowed in, regardless. He takes a random seat. Each subsequent passenger takes his or her assigned seat if available, or a random unoccupied seat, otherwise. What is the probability that the last passenger to board the plane finds his seat unoccupied?

Your task is to determine the requested probability, either by reasoning mathematically or by writing a program to demonstrate the probability. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.