Pandigital Squares, Faster And Smaller

October 6, 2020

Let’s begin with the program from the previous exercise; it takes 76 milliseconds and generates 82 megabytes of garbage:

(define (pandigital? n)
  (equal? (sort < (digits n)) (range 10)))
(define (ps1)
  (list-of (list n n2)
    (n range 10000 100000)
    (n2 is (* n n))
    (pandigital? n2)))

The result of calling ps1 and all the others to come is a list of 87 pandigital squares. We note first that the lower bound on the iteration is too loose; some of the numbers it produces have less than ten digits. We fix that by starting at the square root of 1000000000:

(define (ps2)
  (list-of (list n n2)
    (n range 31623 100000)
    (n2 is (* n n))
    (pandigital? n2)))

That reduces the number of pandigital tests from 90000 to 68377, a reduction of 24%. Our statistics reflect that: time goes to 60 milliseconds, a reduction of 21%, and garbage goes to 64 megabytes, a reduction of 22%.

Our next improvement is the big one, and it’s due to algorithm tuning, not code tuning. A pandigital number must be divisible by nine, so we can theoretically reduce the number of pandigital tests by eight-ninths:

(define (ps3)
  (list-of (list n n2)
    (n range 31623 100000)
    (n2 is (* n n))
    (zero? (modulo n2 9))
    (pandigital? n2)))

There are 22793 numbers tested for pandigitalness, so we don’t get the theoretical reduction; that’s because any n with a factor of 3 causes an n² divisible by 9, which happens about a third of the time. Our statistics show that, with a reduction of 65% to 21 milliseconds, and garbage reduced by 67% to 21 megabytes.

Our next step moves the test for pandigitalness inside the main function, which saves the cost of the function call; it has no effect on run time or garbage, but makes it easier to keep track of our improvements:

(define (ps4)
  (list-of (list n n2)
    (n range 31623 100000)
    (n2 is (* n n))
    (zero? (modulo n2 9))
    (equal? (sort < (digits n2)) (range 10))))

Our next improvement is to move the (range 10) computation out of the loop. There is no need to recompute that every time through the loop, and we also save a lot of garbage creation:

(define ps5
  (let ((ds (range 10)))
    (lambda ()
      (list-of (list n n2)
        (n range 31623 100000)
        (n2 is (* n n))
        (zero? (modulo n2 9))
        (equal? (sort < (digits n2)) ds)))))

That reduces the elapsed time to 18 milliseconds, a 14% reduction, and reduces garbage 34% to 14 megabytes.

We make one last change. Instead of comparing lists of digits, we use bit-arithmetic to accumulate and check the digits; the do in the final clause of the list comprehension sets a bit for each digit seen and returns a boolean from the pop-count expression:

(define (ps6)
  (list-of (list n n2)
    (n range 31623 100000)
    (n2 is (* n n))
    (zero? (modulo n2 9))
    (do ((ds (digits n2) (cdr ds))
         (bits 0 (bitwise-ior bits (ash 1 (car ds)))))
        ((null? ds) (= (pop-count bits) 10)))))

The elapsed time here is 10 milliseconds, a 44% reduction, as bit arithmetic is faster than consing a list, and garbage is reduced 71% to 4 megabytes.

I thought that was the last change. But when I woke up the next morning after writing that, I thought of one additional change: calculate the digits of the number on-the-fly, which saves the time spent consing up the intermediate lists of digits and eliminates the associated garbage:

(define (ps7)
  (list-of (list n n2)
    (n range 31623 100000)
    (n2 is (* n n))
    (zero? (modulo n2 9))
    (let loop ((n2 n2) (bits 0))
      (if (zero? n2)
          (= (pop-count bits) 10)
          (loop (quotient n2 10)
                (bitwise-ior bits (ash 1 (remainder n2 10))))))))

The elapsed time here is 7 milliseconds, a 30% reduction, and the garbage generated is a mere 370 kilobytes, a 91% reduction. Here’s a summary (sorry for the ugliness, but WordPress and I couldn’t agree on how to format a table):

                          LapsedTime   ---Garbage---
                            ms   -%      bytes   -%
                          ----- ----   -------- ----
1 Naive algorithm         75.67        82189712     
2 Increase lower bound    59.57 21.3   63859232 22.3
3 Divisible by nine       21.10 64.6   21288128 66.7
4 Inline function call    21.41 -1.5   21288768  0.0
5 Constant out of loop    18.07 15.6   13994368 34.3
6 Bit-arithmetic          10.44 42.2    4017152 71.3
7 Inline call to digits    6.74 35.4     370272 90.8
                                ----            ----
  Overall total                 91.1            99.5

Overall, we have a 91% reduction in elapsed time and a 100% reduction in garbage generated (okay, a 99.55% reduction in garbage generated, but what’s less than half a percent between friends?). Is all the effort worth it? My answer is no; the naive solution is good enough, because this is surely a one-off programming task, and the extra work of tuning the code is needless busywork. But it does make a good programming exercise.

You can run the program at

Pages: 1 2

2 Responses to “Pandigital Squares, Faster And Smaller”

  1. Make the range count by 3 in stead of checking the square mod 9.

  2. Daniel said

    Here’s a solution in C.

    Relative to my earlier solution, this reduces the space requirements by using a bit array for determining if a number is pandigital, instead of using an array of integers. This alone slightly increases the time requirements, but is more than offset by another modification that conducts the search in parallel. In aggregate, using 6 threads on my 8-core CPU, runtime decreases by about 41% versus my earlier solution.

     * search.c
     * Build
     *   $ gcc -O2 -o search -fopenmp search.c
     * Usage
     *   $ [OMP_NUM_THREADS=INT] search
    #include <stdbool.h>
    #include <stdint.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <omp.h>
    #define MIN_PANDIGITAL 1023456789
    #define MAX_PANDIGITAL 9876543210
    static bool is_pandigital(int64_t x) {
        return false;
      int array = 0;
      for (int i = 0; i < 10; ++i) {
        int r = x % 10;
        if ((array >> r) & 1)
          return false;
        array |= 1 << r;
        x /= 10;
      return true;
    int main(void) {
      int start = 31991;  // int(sqrt(MIN_PANDIGITAL))
      int end = 99380;  // int(sqrt(MAX_PANDIGITAL))
      #pragma omp parallel
        int num_threads = omp_get_num_threads();
        int thread_num = omp_get_thread_num();
        for (int x = start + thread_num; x <= end; x += num_threads) {
          int64_t square = (int64_t)x * (int64_t)x;
          if (is_pandigital(square)) {
            #pragma omp critical
            printf("%ld\n", square);
      return EXIT_SUCCESS;

    Example Usage:

    $ export OMP_NUM_THREADS=6; time for x in {1..1000}; do ./search > /dev/null; done
    real    0m1.673s
    user    0m3.697s
    sys     0m0.860s
    $ ./search

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: