## Pandigital Squares, Faster And Smaller

### October 6, 2020

Let’s begin with the program from the previous exercise; it takes 76 milliseconds and generates 82 megabytes of garbage:

```(define (pandigital? n)
(equal? (sort < (digits n)) (range 10)))```
```(define (ps1)
(list-of (list n n2)
(n range 10000 100000)
(n2 is (* n n))
(pandigital? n2)))```

The result of calling `ps1` and all the others to come is a list of 87 pandigital squares. We note first that the lower bound on the iteration is too loose; some of the numbers it produces have less than ten digits. We fix that by starting at the square root of 1000000000:

```(define (ps2)
(list-of (list n n2)
(n range 31623 100000)
(n2 is (* n n))
(pandigital? n2)))```

That reduces the number of pandigital tests from 90000 to 68377, a reduction of 24%. Our statistics reflect that: time goes to 60 milliseconds, a reduction of 21%, and garbage goes to 64 megabytes, a reduction of 22%.

Our next improvement is the big one, and it’s due to algorithm tuning, not code tuning. A pandigital number must be divisible by nine, so we can theoretically reduce the number of pandigital tests by eight-ninths:

```(define (ps3)
(list-of (list n n2)
(n range 31623 100000)
(n2 is (* n n))
(zero? (modulo n2 9))
(pandigital? n2)))```

There are 22793 numbers tested for pandigitalness, so we don’t get the theoretical reduction; that’s because any n with a factor of 3 causes an n² divisible by 9, which happens about a third of the time. Our statistics show that, with a reduction of 65% to 21 milliseconds, and garbage reduced by 67% to 21 megabytes.

Our next step moves the test for pandigitalness inside the main function, which saves the cost of the function call; it has no effect on run time or garbage, but makes it easier to keep track of our improvements:

```(define (ps4)
(list-of (list n n2)
(n range 31623 100000)
(n2 is (* n n))
(zero? (modulo n2 9))
(equal? (sort < (digits n2)) (range 10))))```

Our next improvement is to move the `(range 10)` computation out of the loop. There is no need to recompute that every time through the loop, and we also save a lot of garbage creation:

```(define ps5
(let ((ds (range 10)))
(lambda ()
(list-of (list n n2)
(n range 31623 100000)
(n2 is (* n n))
(zero? (modulo n2 9))
(equal? (sort < (digits n2)) ds)))))```

That reduces the elapsed time to 18 milliseconds, a 14% reduction, and reduces garbage 34% to 14 megabytes.

We make one last change. Instead of comparing lists of digits, we use bit-arithmetic to accumulate and check the digits; the `do` in the final clause of the list comprehension sets a bit for each digit seen and returns a boolean from the `pop-count` expression:

```(define (ps6)
(list-of (list n n2)
(n range 31623 100000)
(n2 is (* n n))
(zero? (modulo n2 9))
(do ((ds (digits n2) (cdr ds))
(bits 0 (bitwise-ior bits (ash 1 (car ds)))))
((null? ds) (= (pop-count bits) 10)))))```

The elapsed time here is 10 milliseconds, a 44% reduction, as bit arithmetic is faster than consing a list, and garbage is reduced 71% to 4 megabytes.

I thought that was the last change. But when I woke up the next morning after writing that, I thought of one additional change: calculate the digits of the number on-the-fly, which saves the time spent consing up the intermediate lists of digits and eliminates the associated garbage:

```(define (ps7)
(list-of (list n n2)
(n range 31623 100000)
(n2 is (* n n))
(zero? (modulo n2 9))
(let loop ((n2 n2) (bits 0))
(if (zero? n2)
(= (pop-count bits) 10)
(loop (quotient n2 10)
(bitwise-ior bits (ash 1 (remainder n2 10))))))))```

The elapsed time here is 7 milliseconds, a 30% reduction, and the garbage generated is a mere 370 kilobytes, a 91% reduction. Here’s a summary (sorry for the ugliness, but WordPress and I couldn’t agree on how to format a table):

```                          LapsedTime   ---Garbage---
ms   -%      bytes   -%
----- ----   -------- ----
1 Naive algorithm         75.67        82189712
2 Increase lower bound    59.57 21.3   63859232 22.3
3 Divisible by nine       21.10 64.6   21288128 66.7
4 Inline function call    21.41 -1.5   21288768  0.0
5 Constant out of loop    18.07 15.6   13994368 34.3
6 Bit-arithmetic          10.44 42.2    4017152 71.3
7 Inline call to digits    6.74 35.4     370272 90.8
----            ----
Overall total                 91.1            99.5```

Overall, we have a 91% reduction in elapsed time and a 100% reduction in garbage generated (okay, a 99.55% reduction in garbage generated, but what’s less than half a percent between friends?). Is all the effort worth it? My answer is no; the naive solution is good enough, because this is surely a one-off programming task, and the extra work of tuning the code is needless busywork. But it does make a good programming exercise.

You can run the program at https://ideone.com/tyoNwc.

Pages: 1 2

### 3 Responses to “Pandigital Squares, Faster And Smaller”

1. Make the range count by 3 in stead of checking the square mod 9.

2. Daniel said

Here’s a solution in C.

Relative to my earlier solution, this reduces the space requirements by using a bit array for determining if a number is pandigital, instead of using an array of integers. This alone slightly increases the time requirements, but is more than offset by another modification that conducts the search in parallel. In aggregate, using 6 threads on my 8-core CPU, runtime decreases by about 41% versus my earlier solution.

```/*
* search.c
*
* Build
*   \$ gcc -O2 -o search -fopenmp search.c
*
* Usage
*/

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include <omp.h>

#define MIN_PANDIGITAL 1023456789
#define MAX_PANDIGITAL 9876543210

static bool is_pandigital(int64_t x) {
if (x < MIN_PANDIGITAL || x > MAX_PANDIGITAL)
return false;
int array = 0;
for (int i = 0; i < 10; ++i) {
int r = x % 10;
if ((array >> r) & 1)
return false;
array |= 1 << r;
x /= 10;
}
return true;
}

int main(void) {
int start = 31991;  // int(sqrt(MIN_PANDIGITAL))
int end = 99380;  // int(sqrt(MAX_PANDIGITAL))
#pragma omp parallel
{
for (int x = start + thread_num; x <= end; x += num_threads) {
int64_t square = (int64_t)x * (int64_t)x;
if (is_pandigital(square)) {
#pragma omp critical
printf("%ld\n", square);
}
}
}
return EXIT_SUCCESS;
}
```

Example Usage:

```\$ export OMP_NUM_THREADS=6; time for x in {1..1000}; do ./search > /dev/null; done
real    0m1.673s
user    0m3.697s
sys     0m0.860s

\$ ./search
1026753849
1042385796
1248703569
1098524736
...
9054283716
9351276804
9761835204
9814072356
```
3. matthew said

Here’s some Haskell, first some variants of the pandigital function itself:

```-- pandigital.hs
import Data.List
import Data.Bits

pandigital0 :: Int -> Bool
pandigital0 n = sort (show n) == "0123456789"

-- Use a bitset to track digits
digits1 :: Int -> Int
digits1 0 = 0
digits1 n = setBit (digits1 (div n 10)) (mod n 10)

pandigital1 :: Int -> Bool
pandigital1 n = digits1 n == 1023

-- Bitset with accumulating parameter
digits2 :: Int -> Int -> Int
digits2 s 0 = s
digits2 s n = digits2 (setBit s (mod n 10)) (div n 10)

pandigital2 :: Int -> Bool
pandigital2 n = digits2 0 n == 1023

-- Predeclare the list of candidates as Int list:
s :: [Int]
s = map (^2) [3,6..100000]
```

Now try it out. It’s important for speed to a) ensure the interpreter is actually compiling and optimizing input files, and b) make sure we use Int type declarations appropriately to avoid conversions to and from Integer bignums:

```\$ ghci -fobject-code -O pandigital.hs
GHCi, version 8.6.5: http://www.haskell.org/ghc/  :? for help
Prelude Main> :set +s
Prelude Main> length s
33333
(0.11 secs, 2,214,536 bytes)
Prelude Main> length \$ filter pandigital0 s
87
(0.03 secs, 57,069,912 bytes)
Prelude Main> length \$ filter pandigital1 s
87
(0.02 secs, 83,248 bytes)
Prelude Main> length \$ filter pandigital2 s
87
(0.02 secs, 83,248 bytes)
```