## Phone Numbers And Prime Factors

### June 24, 2016

John Cook is a mathematician and programmer who runs a fascinating blog that I frequent.

Cook recently had an article about the prime factors of telephone numbers. He explained that, for 10-digit telephone numbers as used in the United States, the average number of distinct prime factors is 3.232 and the distribution is between 1 and 5 distinct prime factors about 73% of the time.

Your task is to write a function that determines the number of distinct prime factors of a number, and use that function to explore the distribution of the number of distinct prime factors in a range of telephone numbers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Two Interview Questions

### June 21, 2016

I like to read a web site called Career Cup, both to enjoy solving some of the programming exercise given there and to find exercise for Programming Praxis. As I write this exercise, here are the two most recent exercises on Career Cup:

- Given a function
`rand2`

that returns 0 or 1 with equal probability, write a function`rand3`

that returns 0, 1 or 2 with equal probability, using only`rand2`

as a source of random numbers. - Given a set of characters and a dictionary of words, find the shortest word in the dictionary that contains all of the characters in the set. In case of a tie, return all the words of the same (shortest) length.

Your task is to write the two programs described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Tomohiko Sakamoto’s Day-Of-Week Algorithm

### June 17, 2016

Here is Sakamoto’s algorithm for calculating the day of the week, taken from the comment that introduces the code:

Jan 1st 1 AD is a Monday in Gregorian calendar.

So Jan 0th 1 AD is a Sunday [It does not exist technically].Every 4 years we have a leap year. But xy00 cannot be a leap unless xy divides 4 with reminder 0.

y/4 – y/100 + y/400 : this gives the number of leap years from 1AD to the given year. As each year has 365 days (divdes 7 with reminder 1), unless it is a leap year or the date is in Jan or Feb, the day of a given date changes by 1 each year. In other case it increases by 2.

y -= m So y + y/4 – y/100 + y/400 gives the day of Jan 0th (Dec 31st of prev year) of the year. (This gives the reminder with 7 of the number of days passed before the given year began.)

Array t: Number of days passed before the month ‘m+1’ begins.

So t[m-1]+d is the number of days passed in year ‘y’ upto the given date.

(y + y/4 – y/100 + y/400 + t[m-1] + d) % 7 is reminder of the number of days from Jan 0 1AD to the given date which will be the day (0=Sunday,6=Saturday).

int dow(int y, int m, int d) { static int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4}; y -= m < 3; return (y + y/4 - y/100 + y/400 + t[m-1] + d) % 7; }

Another description is given here.

Your task is to write a program that implements the day-of-week algorithm shown above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Duplicate Items In An Array

### June 14, 2016

Today’s exercise is in two parts, first a commonly-seen programming exercise and then a variant on it; the origin of the exercise is certainly someone’s homework, but since school is out for the year it doesn’t matter that we do the exercise today.

First, write a program that, given an array of integers in unsorted order, finds the single duplicate number in the array. For instance, given the input [1,2,3,1,4], the correct output is 4.

Second, write a program that, given an array of integers in unsorted order, finds all of the multiple duplicate numbers in the array. For instance, given the input [1,2,3,1,2,4,1], the correct output is [1,2,1].

Your task is to write the two programs that find duplicates in an array. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Linear Regression

### June 10, 2016

Linear regression is a widely-used statistical technique for relating two sets of variables, traditionally called *x* and *y*; the goal is to find the line-of-best-fit, *y* = *m* *x* + *b*, that most closely relates the two sets. The formulas for computing the line of best fit are:

m= (n× Σxy− Σx× Σy) ÷ (n× Σx^{2}− (Σx)^{2})

b= (Σy−m× Σx) ÷n

You can find those formulas in any statistics textbook. As an example, given the sets of variables

x y 60 3.1 61 3.6 62 3.8 63 4.0 65 4.1

the line of best fit is *y* = 0.1878 *x* − 7.9635, and the estimated value of the missing *x* = 64 is 4.06.

Your task is to write a program that calculates the slope m and intercept b for two sets of variables x and y. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Goldbach’s Other Conjecture

### June 7, 2016

Christian Goldbach (1690-1764) was a Prussian mathematician and contemporary of Euler. One of the most famous unproven conjectures in number theory is known as Goldbach’s Conjecture, which states that every even number greater than two is the sum of two prime numbers; for example, 28 = 5 + 23. We studied Goldbach’s Conjecture in a previous exercise.

Although it is not as well known, Goldbach made another conjecture as follows: Every odd number greater than two is the sum of a prime number and twice a square; for instance, 27 = 19 + 2 * (2 ** 2). (The conjecture is sometimes stated as every odd composite number is the sum of a prime number and twice a square, since it is trivially true with 0 as the square root for all prime numbers; alternately, it is sometimes limited so that the number being squared must be positive, in which case there are some odd primes that can be so expressed.) Sadly, it is easy to find a counter-example that disproves Goldbach’s other conjecture.

Your task is to write a program that finds the smallest number that disproves Goldbach’s other conjecture. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## A Dozen Lines Of Code

### June 3, 2016

Today’s exercise demonstrates that it is sometimes possible to do a lot with a little.

Your task is to write some interesting and useful program in no more than a dozen lines of code. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Learn A New Language

### May 31, 2016

It’s fun to learn new programming languages. It’s also useful, even if you never use the new language, because it forces you to think differently about how you do things.

Your task is to write a familiar program in an unfamiliar language. When you are finished, you are welcome to read or run ([1], [2]) a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Pollard’s Rho Algorithm For Discrete Logarithms

### May 27, 2016

We studied discrete logarithms in two previous exercises. Today we look at a third algorithm for computing discrete algorithms, invented by John Pollard in the mid 1970s. Our presentation follows that in the book *Prime Numbers: A Computational Perspective* by Richard Crandall and Carl Pomerance, which differs somewhat from other sources.

Our goal is to compute *l* (some browsers mess that up; it’s a lower-case ell, for “logarithm”) in the expression *g ^{l}* ≡

*t*(mod

*p*); here

*p*is a prime greater than 3,

*g*is an integer generator on the range 1 ≤

*g*<

*p*, and

*t*is an integer target on the range 1 ≤

*g*<

*p*. Pollard takes a sequence of integer pairs (

*a*,

_{i}*b*) modulo (

_{i}*p*− 1) and a sequence of integers

*x*modulo

_{i}*p*such that

*x*=

_{i}*t*g

^{ai}^{bi}(mod

*p*), beginning with

*a*

_{0}=

*b*

_{0}= 0 and

*x*

_{0}= 1. Then the rule for deriving the terms of the various sequences is:

- If 0 <
*x*<_{i}*p*/3, then*a*_{i+1}= (*a*+ 1) mod (_{i}*p*− 1),*b*_{i+1}=*b*, and_{i}*x*_{i+1}=*t x*(mod_{i}*p*). - If
*p*/3 <*x*< 2_{i}*p*/3, then*a*_{i+1}= 2*a*mod (_{i}*p*− 1),*b*_{i+1}= 2*b*mod (_{i}*p*− 1), and*x*_{i+1}=*x*_{i}^{2}mod*p*. - If 2
*p*/3 <*x*<_{i}*p*, then*a*_{i+1}=*a*,_{i}*b*_{i+1}= (*b*+ 1) mod (_{i}*p*− 1), and*x*_{i+1}=*g x*mod_{i}*p*.

Splitting the computation into three pieces “randomizes” the calculation, since the interval in which *x _{i}* is found has nothing to do with the logarithm. The sequences are computed until some

*x*=

_{j}*x*, at which point we have

_{k}*t*

^{aj}*g*=

^{bj}*t*

^{ak}*g*. Then, if

^{bk}*a*−

_{j}*a*is coprime to

_{j}*p*− 1, we compute the discrete logarithm

*l*as (

*a*−

_{j}*a*)

_{k}*l*≡

*b*−

_{k}*b*(mod (

_{j}*p*− 1)). However, if the greatest common divisor of

*a*−

_{j}*a*with

_{j}*p*− 1 is

*d*> 1, then we compute (

*a*−

_{j}*a*)

_{k}*l*

_{0}≡

*b*−

_{k}*b*(mod (

_{j}*p*− 1) /

*d*), and

*l*=

*l*

_{0}+

*m*(

*p*− 1) /

*d*for some

*m*= 0, 1, …,

*d*− 1, which must all be checked until the discrete logarithm is found.

Thus, Pollard’s rho algorithm consists of iterating the sequences until a match is found, for which we use Floyd’s cycle-finding algorithm, just as in Pollard’s rho algorithm for factoring integers. Here are outlines of the two algorithms, shown side-by-side to highlight the similarities:

# find d such that d | n # find l such that g**l = t (mod p) function factor(n) function dlog(g, t, p) func f(x) := (x*x+c) % n func f(x,a,b) := ... as above ... t, h, d := 1, 1, 1 j := (1,0,0); k := f(1,0,0) while d == 1 while j.x <> k.x t = f(t) j(x,a,b) := f(j.x, j.a, j.b) h = f(f(h)) k(x,a,b) := f(f(k.x, k.a, k.b)) d = gcd(t-h, n) d := gcd(j.a-k.a, p-1) return d return l ... as above ...

Please pardon some abuse of notation; I hope the intent is clear. In the factoring algorithm, it is possible that *d* is the trivial factor *n*, in which case you must try again with a different constant in the *f* function; the logarithm function has no such possibility. Most of the time consumed in the computation is the modular multiplications in the calculations of the *x* sequence; the algorithm itself is O(sqrt *p*), the same as the baby-steps, giant-steps algorithm of a previous exercise, but the space requirement is only a small constant, rather than the O(sqrt *p*) space required of the previous algorithm. In practice, the random split is made into more than 3 pieces, which complicates the code but speeds the computation, as much as a 25% improvement on average.

Your task is to write a program that computes discrete logarithms using Pollard’s rho algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Test Scores

### May 24, 2016

The high school two blocks from me just had their annual picnic, my youngest daughter just graduated from college, and my primarily academic readership suddenly dropped in half (history suggest it will stay low until mid-August), so it seems to be the right season to have a simple data-processing task involving student test scores.

Given a list of student names and test scores, compute the average of the top five scores for each student. You may assume each student has as least five scores.

Your task is to compute student scores as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.