## Non-Breaking Hyphen

### October 2, 2020

Today’s exercise tells the story of a problem I faced in my day job. A user apparently copy/pasted a field from Word, Excel or Outlook into our data processing system, which assumes plain ascii (the underlying database, Oracle, handles unicode properly, but the system on top of Oracle doesn’t). Unfortunately, although the field looked okay,

10001366650-1

the dash was actually a non-breaking hyphen, unicode 2011_{10}, which broke the system in a rather large way. The field in question was the vendor invoice number in an accounts payable system, and when the check paying that invoice was written, the check-writing program dropped the remittance advice from that check, so every subsequent check had the wrong remittance advice attached. The error wasn’t discovered immediately, so some of the checks were already mailed, making recovery difficult (we couldn’t just void the check run and start over because some of the checks were already mailed, and restarting the check run meant the check numbers wouldn’t match). So it was a grand old mess. In case you’re curious, I demonstrated the error with this SQL statement

select asciistr(fabinvh_vend_inv_code) from fabinvh where fabinvh_code = 'I2104519'

which returned

1000136650\20111

Unicode is nearly thirty years old. Users have the right to expect that their systems handle unicode characters properly.

Your task is to write a program that detects unicode/ascii error; you might also tell us about any unicode horror stories you have faced. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Pandigital Squares

### September 29, 2020

A pandigital number, like 4730825961, is a ten-digit number in which each digit from zero to nine appears exactly once. A square number, like 25² = 625, is a number with an integral square root.

Your task is to write a program that finds all of the pandigital square numbers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## 8424432925592889329288197322308900672459420460792433

### September 1, 2020

Regular readers know of my affinity for number theory. Today’s exercise is a reflection of that.

It is conjectured that the two numbers produced by the equations

n^{17}+ 9 and (n+ 1)^{17}+ 9 for a givennare always co-prime (that is, their greatest common divisor is 1). Is that conjecture correct?

Your task is to either prove that the greatest common divisor is always 1 or write a program that finds a case where the greatest common divisor is not 1. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Approximate Median

### August 21, 2020

[ I offer my apologies to my readers for my recent absence. My employer, a local community college, is struggling with this virus business, revising nearly all of its business practices, and my programmer colleagues and I have been very busy. The Fall semester starts next week (mostly on-line classes, some on-campus classes for science labs and the nursing students), so hopefully things on the virus front will get better soon. But we are also in the middle of changing our main computing system from running on HP-UX on fifteen-year old hardware to Linux on new hardware, and having all kinds of setup problems (all of the people who set up the current system twenty years ago are gone, and no one seems to know how to set up the new system), so maybe not too soon. I hope all is well with all of you. — Phil ]

We have previously studied algorithms for the streaming median and sliding median that calculate the median of a stream of numbers; the streaming median requires storage of all the numbers previously seen, and the sliding median requires storage of the last *k* numbers in the stream, for some *k*.

Today’s exercise estimates the median of a stream of numbers while storing only two numbers:

The idea is at each iteration the median inches toward the input signal at a constant rate. The rate depends on what magnitude you estimate the median to be. I use the average as an estimate of the magnitude of the median, to determines the size of each increment of the median. If you need your median accurate to about 1%, use a step-size of 0.01 * the average.

Your task is to write a program that estimates the streaming median according to the given algorithm. When you are finished, you are welcome to read or run the suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Loglog

### August 4, 2020

There are 19,055 distinct words in the Bible:

$ cat bible.txt | tr -cs A-Za-z ‘

‘ | sort -u | wc -w

19055

It’s easy enough to count the number of distinct items in a set (its “cardinality”) when the set is small, but when the set is large, the intermediate storage required for the distinct items can be overwhelming.

Phillipe Flajolet and various co-authors wrote a series of papers in which they developed methods of estimating the cardinality of a set with only a small amount of auxiliary storage, using randomization; Flajolet’s algorithms can be seen as an improvement on Robert Morris’ counting algorithm that we studied in a previous exercise. We will study Flajolet’s loglog algorithm in today’s exercise and perhaps have a look at his other algorithms in future exercises.

The basic idea is to apply a hash function to each element of the set. The first bit of the hash value will be zero about half the time, the first two bits of the hash value will be zero about a quarter of the time, the first three bits of the hash value will be zero about an eighth of the time, and so on; by looking at the maximum number of leading zero-bits, we can estimate the cardinality of the set. Flajolet extends this algorithm by splitting the counts among 2^{k} buckets and averaging the estimated cardinalities; the bucket is selected randomly by looking at the last *k* bits of the hash value.

Your task is to implement Flajolet’s loglog algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Lost Boarding Pass

### July 21, 2020

We have today a fun little problem from probability:

On a sold-out flight, 100 people line up to board the plane. The first passenger in the line has lost his boarding pass but was allowed in, regardless. He takes a random seat. Each subsequent passenger takes his or her assigned seat if available, or a random unoccupied seat, otherwise. What is the probability that the last passenger to board the plane finds his seat unoccupied?

Your task is to determine the requested probability, either by reasoning mathematically or by writing a program to demonstrate the probability. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Binary Concatenation

### July 14, 2020

We have an interview question today:

The concatenation of the first four integers, written in binary, is 11011100; that is, 1 followed by 10 followed by 11 followed by 100. That concatenated number resolves to 220. A similar process can convert the concatenation of the first

nbinary numbers to a normal decimal number.

Your task is to compute the *n*th binary concatenation in the manner described above; report the result modulo 10^{9}+7, because the result grows so quickly. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Trailing Zero-Bits

### July 7, 2020

Today’s exercise indulges in some bit-hackery:

Given a positive integer, count the number of trailing zero-bits in its binary representation. For instance, 18

_{10}= 10010_{2}, so it has 1 trailing zero-bit, and 48_{10}= 110000_{2}, so it has 4 trailing zero-bits.

Your task is to write a program that counts the number of trailing zero-bits in the binary representation of a positive integer. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Spelling Numbers

### July 3, 2020

## Shuffle An Array

### June 30, 2020

Today’s exercise comes to us from Leetcode via Reddit:

Given an array consisting of 2

nelements in the form

[x1,x2,…,xn,y1,y2,…,yn], return the array in the form [x1,y1,x2,y2,…,xn,yn].

The Reddit poster claims to be new to Scheme and functional programming, and was thinking of a solution using length and list-ref, but couldn’t solve the problem.

Your task is to show the student how to solve the problem. When you are

finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.