Doubled Pairs

October 20, 2020

Today’s exercise is somebody’s homework problem:

You are given an array containing positive integers, all distinct. Write a program that determines if there exist in the array any numbers such that one is double the other. For instance, in the array [1,2,3,4], the pairs 1,2 and 2,4 are both doubles, so the program should return True; in the array [1,3,5,7,9] there is no such doubled pair, so the program should return False. For extra credit, return a pair that proves the result is True, if it exists. For extra-extra credit, return all such pairs. Your program must run in linear time.

The student who asked for help realized there was a simple nested-loop solution that runs in quadratic time, but that doesn’t meet the constraints of the homework problem.

Your task is to solve all three levels of the homework problem. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

In a previous exercise, we wrote a program to compute pandigital squares, defined as ten-digit numbers with integral square roots in which each digit zero through nine appears exactly once. We remarked at the time that our solution, though not very fast, was fast enough.

Your task is to write a program that finds pandigital squares, reducing the time and space requirements of the naive solution. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Non-Breaking Hyphen

October 2, 2020

Today’s exercise tells the story of a problem I faced in my day job. A user apparently copy/pasted a field from Word, Excel or Outlook into our data processing system, which assumes plain ascii (the underlying database, Oracle, handles unicode properly, but the system on top of Oracle doesn’t). Unfortunately, although the field looked okay,

10001366650-1

the dash was actually a non-breaking hyphen, unicode 201110, which broke the system in a rather large way. The field in question was the vendor invoice number in an accounts payable system, and when the check paying that invoice was written, the check-writing program dropped the remittance advice from that check, so every subsequent check had the wrong remittance advice attached. The error wasn’t discovered immediately, so some of the checks were already mailed, making recovery difficult (we couldn’t just void the check run and start over because some of the checks were already mailed, and restarting the check run meant the check numbers wouldn’t match). So it was a grand old mess. In case you’re curious, I demonstrated the error with this SQL statement

select asciistr(fabinvh_vend_inv_code)
from   fabinvh
where  fabinvh_code = 'I2104519'

which returned

1000136650\20111

Unicode is nearly thirty years old. Users have the right to expect that their systems handle unicode characters properly.

Your task is to write a program that detects unicode/ascii error; you might also tell us about any unicode horror stories you have faced. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Pandigital Squares

September 29, 2020

A pandigital number, like 4730825961, is a ten-digit number in which each digit from zero to nine appears exactly once. A square number, like 25² = 625, is a number with an integral square root.

Your task is to write a program that finds all of the pandigital square numbers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Regular readers know of my affinity for number theory. Today’s exercise is a reflection of that.

It is conjectured that the two numbers produced by the equations n17 + 9 and (n + 1)17 + 9 for a given n are always co-prime (that is, their greatest common divisor is 1). Is that conjecture correct?

Your task is to either prove that the greatest common divisor is always 1 or write a program that finds a case where the greatest common divisor is not 1. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Approximate Median

August 21, 2020

[ I offer my apologies to my readers for my recent absence. My employer, a local community college, is struggling with this virus business, revising nearly all of its business practices, and my programmer colleagues and I have been very busy. The Fall semester starts next week (mostly on-line classes, some on-campus classes for science labs and the nursing students), so hopefully things on the virus front will get better soon. But we are also in the middle of changing our main computing system from running on HP-UX on fifteen-year old hardware to Linux on new hardware, and having all kinds of setup problems (all of the people who set up the current system twenty years ago are gone, and no one seems to know how to set up the new system), so maybe not too soon. I hope all is well with all of you. — Phil ]

We have previously studied algorithms for the streaming median and sliding median that calculate the median of a stream of numbers; the streaming median requires storage of all the numbers previously seen, and the sliding median requires storage of the last k numbers in the stream, for some k.

Today’s exercise estimates the median of a stream of numbers while storing only two numbers:

The idea is at each iteration the median inches toward the input signal at a constant rate. The rate depends on what magnitude you estimate the median to be. I use the average as an estimate of the magnitude of the median, to determines the size of each increment of the median. If you need your median accurate to about 1%, use a step-size of 0.01 * the average.

Your task is to write a program that estimates the streaming median according to the given algorithm. When you are finished, you are welcome to read or run the suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Loglog

August 4, 2020

There are 19,055 distinct words in the Bible:

$ cat bible.txt | tr -cs A-Za-z ‘
‘ | sort -u | wc -w
19055

It’s easy enough to count the number of distinct items in a set (its “cardinality”) when the set is small, but when the set is large, the intermediate storage required for the distinct items can be overwhelming.

Phillipe Flajolet and various co-authors wrote a series of papers in which they developed methods of estimating the cardinality of a set with only a small amount of auxiliary storage, using randomization; Flajolet’s algorithms can be seen as an improvement on Robert Morris’ counting algorithm that we studied in a previous exercise. We will study Flajolet’s loglog algorithm in today’s exercise and perhaps have a look at his other algorithms in future exercises.

The basic idea is to apply a hash function to each element of the set. The first bit of the hash value will be zero about half the time, the first two bits of the hash value will be zero about a quarter of the time, the first three bits of the hash value will be zero about an eighth of the time, and so on; by looking at the maximum number of leading zero-bits, we can estimate the cardinality of the set. Flajolet extends this algorithm by splitting the counts among 2k buckets and averaging the estimated cardinalities; the bucket is selected randomly by looking at the last k bits of the hash value.

Your task is to implement Flajolet’s loglog algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Lost Boarding Pass

July 21, 2020

We have today a fun little problem from probability:

On a sold-out flight, 100 people line up to board the plane. The first passenger in the line has lost his boarding pass but was allowed in, regardless. He takes a random seat. Each subsequent passenger takes his or her assigned seat if available, or a random unoccupied seat, otherwise. What is the probability that the last passenger to board the plane finds his seat unoccupied?

Your task is to determine the requested probability, either by reasoning mathematically or by writing a program to demonstrate the probability. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Binary Concatenation

July 14, 2020

We have an interview question today:

The concatenation of the first four integers, written in binary, is 11011100; that is, 1 followed by 10 followed by 11 followed by 100. That concatenated number resolves to 220. A similar process can convert the concatenation of the first n binary numbers to a normal decimal number.

Your task is to compute the nth binary concatenation in the manner described above; report the result modulo 109+7, because the result grows so quickly. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Trailing Zero-Bits

July 7, 2020

Today’s exercise indulges in some bit-hackery:

Given a positive integer, count the number of trailing zero-bits in its binary representation. For instance, 1810 = 100102, so it has 1 trailing zero-bit, and 4810 = 1100002, so it has 4 trailing zero-bits.

Your task is to write a program that counts the number of trailing zero-bits in the binary representation of a positive integer. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2