## Pandigital Squares, Faster And Smaller

### October 6, 2020

Let’s begin with the program from the previous exercise; it takes 76 milliseconds and generates 82 megabytes of garbage:

(define (pandigital? n) (equal? (sort < (digits n)) (range 10)))

(define (ps1) (list-of (list n n2) (n range 10000 100000) (n2 is (* n n)) (pandigital? n2)))

The result of calling `ps1`

and all the others to come is a list of 87 pandigital squares. We note first that the lower bound on the iteration is too loose; some of the numbers it produces have less than ten digits. We fix that by starting at the square root of 1000000000:

(define (ps2) (list-of (list n n2) (n range 31623 100000) (n2 is (* n n)) (pandigital? n2)))

That reduces the number of pandigital tests from 90000 to 68377, a reduction of 24%. Our statistics reflect that: time goes to 60 milliseconds, a reduction of 21%, and garbage goes to 64 megabytes, a reduction of 22%.

Our next improvement is the big one, and it’s due to algorithm tuning, not code tuning. A pandigital number must be divisible by nine, so we can theoretically reduce the number of pandigital tests by eight-ninths:

(define (ps3) (list-of (list n n2) (n range 31623 100000) (n2 is (* n n)) (zero? (modulo n2 9)) (pandigital? n2)))

There are 22793 numbers tested for pandigitalness, so we don’t get the theoretical reduction; that’s because any *n* with a factor of 3 causes an *n*² divisible by 9, which happens about a third of the time. Our statistics show that, with a reduction of 65% to 21 milliseconds, and garbage reduced by 67% to 21 megabytes.

Our next step moves the test for pandigitalness inside the main function, which saves the cost of the function call; it has no effect on run time or garbage, but makes it easier to keep track of our improvements:

(define (ps4) (list-of (list n n2) (n range 31623 100000) (n2 is (* n n)) (zero? (modulo n2 9)) (equal? (sort < (digits n2)) (range 10))))

Our next improvement is to move the `(range 10)`

computation out of the loop. There is no need to recompute that every time through the loop, and we also save a lot of garbage creation:

(define ps5 (let ((ds (range 10))) (lambda () (list-of (list n n2) (n range 31623 100000) (n2 is (* n n)) (zero? (modulo n2 9)) (equal? (sort < (digits n2)) ds)))))

That reduces the elapsed time to 18 milliseconds, a 14% reduction, and reduces garbage 34% to 14 megabytes.

We make one last change. Instead of comparing lists of digits, we use bit-arithmetic to accumulate and check the digits; the `do`

in the final clause of the list comprehension sets a bit for each digit seen and returns a boolean from the `pop-count`

expression:

(define (ps6) (list-of (list n n2) (n range 31623 100000) (n2 is (* n n)) (zero? (modulo n2 9)) (do ((ds (digits n2) (cdr ds)) (bits 0 (bitwise-ior bits (ash 1 (car ds))))) ((null? ds) (= (pop-count bits) 10)))))

The elapsed time here is 10 milliseconds, a 44% reduction, as bit arithmetic is faster than consing a list, and garbage is reduced 71% to 4 megabytes.

I thought that was the last change. But when I woke up the next morning after writing that, I thought of one additional change: calculate the digits of the number on-the-fly, which saves the time spent consing up the intermediate lists of digits and eliminates the associated garbage:

(define (ps7) (list-of (list n n2) (n range 31623 100000) (n2 is (* n n)) (zero? (modulo n2 9)) (let loop ((n2 n2) (bits 0)) (if (zero? n2) (= (pop-count bits) 10) (loop (quotient n2 10) (bitwise-ior bits (ash 1 (remainder n2 10))))))))

The elapsed time here is 7 milliseconds, a 30% reduction, and the garbage generated is a mere 370 kilobytes, a 91% reduction. Here’s a summary (sorry for the ugliness, but WordPress and I couldn’t agree on how to format a table):

LapsedTime ---Garbage--- ms -% bytes -% ----- ---- -------- ---- 1 Naive algorithm 75.67 82189712 2 Increase lower bound 59.57 21.3 63859232 22.3 3 Divisible by nine 21.10 64.6 21288128 66.7 4 Inline function call 21.41 -1.5 21288768 0.0 5 Constant out of loop 18.07 15.6 13994368 34.3 6 Bit-arithmetic 10.44 42.2 4017152 71.3 7 Inline call to digits 6.74 35.4 370272 90.8 ---- ---- Overall total 91.1 99.5

Overall, we have a 91% reduction in elapsed time and a 100% reduction in garbage generated (okay, a 99.55% reduction in garbage generated, but what’s less than half a percent between friends?). Is all the effort worth it? My answer is no; the naive solution is good enough, because this is surely a one-off programming task, and the extra work of tuning the code is needless busywork. But it does make a good programming exercise.

You can run the program at https://ideone.com/tyoNwc.

Make the range count by 3 in stead of checking the square mod 9.

Here’s a solution in C.

Relative to my earlier solution, this reduces the space requirements by using a bit array for determining if a number is pandigital, instead of using an array of integers. This alone slightly increases the time requirements, but is more than offset by another modification that conducts the search in parallel. In aggregate, using 6 threads on my 8-core CPU, runtime decreases by about 41% versus my earlier solution.

Example Usage: