## 357 Numbers

### March 6, 2015

This question arose at a job-interview site:

Find all numbers divisible only by 3, 5 and 7. For instance, 35 = 5 × 7 is included in the set, but 30 = 2 × 3 × 5 is not because of the factor of 2.

Your task is to write the requested program and determine how many numbers in the set are less than a million. When you are finished, you are welcome to read or run a suggest solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

### 31 Responses to “357 Numbers”

1. Francesco said
```import Control.Applicative

sol = let f a b c = 3 ^ a * 5 ^ b * 7 ^ c; ls = [0..13] in
length . filter (<10^6) \$ liftA3 f ls ls ls
```

But there must some conspiracy afoot, since my result is 203 :S

2. informatimago said

(length (sort (let ((max 1000000))
(loop
:for 3s = 1 then (* 3 3s)
:while (< 3s max)
:append (loop
:for 5s = 1 then (* 5 5s)
:for 35s = (* 3s 5s)
:while (< 35s max)
:append (loop
:for 7s = 1 then (* 7 7s)
:for 357s = (* 35s 7s)
:while (< 357s max)
:collect 357s))))
(function <)))
203

3. Paul said

In Python. The first method (div357) is more or less brute force. It generates all possible products of powers of 3,5 and 7. The second method uses a queue to generate the same numbers, but this method is in Python a lot slower.

```from itertools import takewhile, count, product
from operator import mul
from heapq import heappush, heappop

def div357(n):
powsi = lambda p: takewhile(lambda x: x < n, (p ** i for i in count()))
return sum(reduce(mul, p) < n for p in product(powsi(3), powsi(5), powsi(7)))

def div357q(n):
Q = 
s = set()
while Q:
p = heappop(Q)
if p in s:
continue
if p >= n:
break
heappush(Q, 3 * p)
heappush(Q, 5 * p)
heappush(Q, 7 * p)
return len(s)

"""
100
609456
elapsed time div357: 5.03 sec
609456
elapsed time div357q: 30.9 sec
"""
```
4. Felipe Ceotto said

I keep getting 202 instead of 203, have gone over everything but can’t seem to find the 203rd one. Are you guys including 0 or 1 there by any chance? Can anyone publish a list of the numbers found?

5. programmingpraxis said

@FelipeCeotto: The number 1 is included in the set. See the first paragraph of the solution page.

6. ceottaki said

Thanks, @programmingpraxis
My somewhat verbose solution in C#:

```        public static int CountDivisiblesByThreeFiveSeven(int topLimit)
{
int numCount = 1;
double currentNumber = 1;

while (currentNumber <= topLimit)
{
double currentNumberFive = currentNumber;
while (currentNumberFive < topLimit)
{
double currentNumberSeven = currentNumberFive;
while (currentNumberSeven < topLimit)
{
currentNumberSeven *= 7;
if (currentNumberSeven <= topLimit)
{
numCount++;
}
}

currentNumberFive *= 5;
if (currentNumberFive <= 1000000)
{
numCount++;
}
}

currentNumber *= 3;
if (currentNumber <= 1000000)
{
numCount++;
}
}

return numCount;
}
```
7. Graham said

Reminiscent of the Hamming Numbers exercise. My Haskell solution is similar to my submission for that exercise:

```merge :: (Ord a) => [a] -> [a] -> [a]
merge xs []         = xs
merge [] ys         = ys
merge (x:xs) (y:ys) | x < y  = x : merge xs (y:ys)
| x == y = x : merge xs ys
| x > y  = y : merge (x:xs) ys

list357 :: [Integer]
list357 = 1 : merge list3 (merge list5 list7)
where
list3 = map (* 3) list357
list5 = map (* 5) list357
list7 = map (* 7) list357

main :: IO ()
main = print . length . takeWhile (< 1000000) \$ list357
```
8. Mike said

Python. Returns a sorted list of the numbers. Could be sped up by pulling powersof() out of the inner loops. As is, p357(1000000) only takes about a millisecond.

```from math import ceil

def powersof(n, limit):
val = 1
while val < limit:
yield val
val *= n

def p357(limit):
return sorted(p3 * p5 * p7
for p3 in powersof(3, limit)
for p5 in powersof(5, ceil(limit/p3))
for p7 in powersof(7, ceil(limit/(p3*p5))))

```
9. Jussi Piitulainen said

This is a Scheme implementation of Mike’s Python, with infrastructure. My generators are procedures that return successive values on successive calls and then raise an exception (or return the values from a sentinel thunk); their ability to return multiple values on each call is not used in this exercise. The mechanism was originally inspired by a post in the Python newsgroup. I’m not sure I’ve got it exactly right, but I’m also not aware of where any problems might lie :/

``` ;;; (length (sorted (p357 1000000))) => 203```

``` (import (rnrs exceptions (6))) ;; in Guile 2.0.5 (define (p357 limit)   (generator    (lambda (return)      (for (powers-of 3 limit)           (lambda (p3)             (for (powers-of 5 (ceiling-quotient limit p3))                  (lambda (p5)                    (for (powers-of 7 (ceiling-quotient limit (* p3 p5)))                         (lambda (p7)                           (return (* p3 p5 p7))))))))))) (define (powers-of n limit)   (generator    (lambda (return)      (do ((p 1 (* p n)))          ((>= p limit))        (return p))))) (define (for gen proc)   (guard    (exn     ((eq? exn 'the-end-of-generator)))    (call-with-values gen proc)    (for gen proc))) (define (sorted gen)   (let ((result '()))     (guard      (exn       ((eq? exn 'the-end-of-generator) (sort result <)))      (let loop ()        (set! result (cons (gen) result))        (loop))))) ```

```(define generator   (case-lambda    ((w)     (let ((s (lambda ()                (raise 'the-end-of-generator))))       (generator w s)))    ((w s)     (let* ((receiver values)            (terminator values)            (condition '())            (body (lambda (yield)                    (w yield)                    (call-with-values s receiver)))            (yield (lambda vs                     (call-with-current-continuation                      (lambda (k)                        (set! body k)                        (apply receiver vs))))))       (lambda ()         (call-with-current-continuation          (lambda (k)            (set! receiver k)            (call-with-current-continuation             (lambda (k)               (set! terminator k)               (guard                (con (else                      (set! condition con)                      (terminator)))                (body yield))               (terminator)))            (raise condition)))))))) ```

10. programmingpraxis said

@Jussi: You could also use the generators defined in the Standard Prelude:

```> (define-generator (make-gen357)   (let ((pq (pq-insert < 1 pq-empty)))       (let loop ()         (let ((x (pq-first pq)))           (while (and (not (pq-empty? pq))                       (= (pq-first pq) x))             (set! pq (pq-rest < pq)))           (set! pq (pq-insert < (* 3 x) pq))           (set! pq (pq-insert < (* 5 x) pq))           (set! pq (pq-insert < (* 7 x) pq))           (yield x) (loop))))) > (define gen357 (make-gen357)) > (gen357) 1 > (gen357) 3 > (gen357) 5 > (gen357) 7 > (gen357) 9 > (gen357) 15 > (gen357) 21 > (define gen357-2 (make-gen357)) > (gen357-2) 1 > (gen357-2) 3 > (gen357-2) 5 > (gen357-2) 7 > (gen357-2) 9 > (gen357) 25 > (gen357) 27 > (gen357) 35```

11. Jussi Piitulainen said

@Praxis, I can’t get your Standard Prelude define-generator to do what I want. There seems to be some problem defining generators in terms of generators, which is precisely what I wanted to emulate Mike’s code. This is again in Guile 2.0.5, with your definition from Standard Prelude, and my power generator modified to raise the specific condition when it runs out of values, as follows.
``` (import (rnrs exceptions (6))) ;;; ... (define-generator (powers-of n limit)   (do ((p 1 (* p n)))       ((>= p limit) (raise 'the-end-of-generator))     (yield p))) ```
Now, this works (with my little looping construct as before):
``` (define (e357 limit)   (for (powers-of 3 limit)        (lambda (p3)          (for (powers-of 5 (ceiling-quotient limit p3))               (lambda (p5)                 (for (powers-of 7 (ceiling-quotient limit (* p3 p5)))                      (lambda (p7)                        (write (* p3 p5 p7))                        (write-char #\space)))))))) ```
That’s an “ordinary” procedure that simply writes out the products from the value-generating procedures. For example:
``` scheme@(guile-user)> (e357 10) 1 7 5 3 9 \$1 = #t ```
The final part is Guile telling that the expression returned a truth value. But
I didn’t want to write those products out. I wanted to generate them to be sorted and counted. And this fails:
``` (define-generator (p357 limit)   (for (powers-of 3 limit)        (lambda (p3)          (for (powers-of 5 (ceiling-quotient limit p3))               (lambda (p5)                 (for (powers-of 7 (ceiling-quotient limit (* p3 p5)))                      (lambda (p7)                        (yield (* p3 p5 p7))))))))) ```
It yields the first product all right, but then it fails to continue:
``` scheme@(guile-user)> (let ((g (p357 10))) (write (g)) (write (g))) 1ERROR: In procedure #<continuation 92456f0>: ERROR: Too few values returned to continuation ```
I’ve reduced the failure down to what I thought would be two identity mappings:
``` (define-generator (gd-id gen) (let loop () (yield (gen)) (loop))) (define-generator (bd-id gen) (for gen (lambda (v) (yield v)))) ```
The first works. The second fails after the first value:
``` scheme@(guile-user)> (for (gd-id (powers-of 3 10)) write) 139\$1 = #t scheme@(guile-user)> (for (bd-id (powers-of 3 10)) write) 1ERROR: In procedure #<continuation 9c79220>: ERROR: Too few values returned to continuation ```
It’s the second one that is like the nested loops in my Mike-emulator.

In my own system the following three all appear to work the same, as I expected.
``` (define (gd-id gen) (generator (lambda (yield) (let loop () (yield (gen)) (loop))))) (define (od-id gen) (generator (lambda (yield) (for gen yield)))) (define (wd-id gen) (generator (lambda (yield) (for gen (lambda (v) (yield v)))))) ```
So I failed to use the Standard Prelude in place of my own generators. Did I try to do something that’s wrong? Much apologies if I’m making some very stupid mistake!

12. Mike said

Inspired by Graham’s Haskell code, I came up with these python generators. s357() generates the values in order without repetition.

```from heapq import merge

def s3():
yield 1
yield from (3*i for i in s3())

def s35():
g3 = s3()
yield next(g3)
yield from merge(g3, (5*i for i in s35()))

def s357():
g35 = s35()
yield next(g35)
yield from merge(g35, (7*i for i in s357()))

# test
import itertools as it
limit = 1000000
lessthanlimit = limit.__gt__
print(list(it.takewhile(lessthanlimit, s357())))   # prints: [1, 3, 5, 7, 9, 15, 21, 25, 27, 35, ..., 972405, 984375]
```
13. Willy Good said

@Mike’s Python code above is quite fast (for Python) due to the efficiency of the heapq merge function, but Python doesn’t really like recursion very much as it isn’t optimized for tail recursion as can be shown by changing the definition of just one of the series generators, the s3() one, to a version not requiring iterator recursion which speeds it up by almost a factor of two, as follows:

```def s3():
n = 3
while True:
yield n
n *= 3
```

The recursion is slower due to the overhead of successive iterator function calls to implement it that way.

Now we could re-write the algorithm to not require the recursive iteration, but as Python doesn’t optimize successive calls very well or inline code, there are even more nested function calls and the code is slightly slower. The following code uses Co-Inductive Streams implemented as two-tuples with the first element carrying the value and the second a “thunk’ lambda for the continuation function to the next value:of the “stream”:

```def s357x():
def merge(sx, sy):
x, xr = sx
y, yr = sy
if x > y:
return y, lambda: merge(yr(), sx)
elif y > x:
return x, lambda: merge(xr(), sy)
else: return x, lambda: merge(xr(), yr())
def smult(s, m):
v, cont = s
return v * m, lambda: smult(cont(), m)
def s3(n):
return n, lambda: s3(n * 3)
def s35():
return 1, lambda: merge(s3(3), smult(s35(), 5))
_, s35r = s35() # to assist in skipping the first entry, the 1
return 1, lambda: merge(s35r(), smult(s357x(), 7))

# test
from time import clock
limit = 100000000000000000000000000000000000
count = 0
elpsd = -clock()
nxt, ncnt = s357x()
while nxt <= limit:
count += 1
nxt, ncnt = ncnt()
elpsd += clock()
print(count, " in ", int(elpsd * 1000), " milliseconds.")
```

For this algorithm, it is better to use a language optimized for tail calls such as Scheme, as follows (this implementation uses Co-Inductive Streams implemented as a Pair, with the elements as per the two-tuple of the Python code):

(define (s357)
(define (merge x y)
(let ((xv (car x)) (yv (car y)))
(cond (( xv yv) (cons yv (lambda () (merge x ((cdr y))))))
(else (cons xv (lambda () (merge ((cdr x)) y)))))))
(define (smult s m) (cons (* (car s) m) (lambda () (smult ((cdr s)) m))))
(define (s3 n) (cons n (lambda () (s3 (* n 3)))))
(define (s35) (cons 1 (lambda () (merge (s3 3) (smult (s35) 5)))))
(let ((s35r (cdr (s35)))) (cons 1 (lambda () (merge (s35r) (smult (s357) 7))))))

;;; testing…
(define limit 100000000000000000000000000000000000)
(define (countemto lmt)(do ((nxt (s357) ((cdr nxt))) (cnt 0 (+ cnt 1))) ((> (car nxt) lmt) cnt)))
(display (time (countemto limit))) (newline)

where even run as interpreted code on http://codepad.org/DUIJHxBA, it runs faster than Python as shown following:

cpu time: 890 real time: 965 gc time: 501
27608

It will run 10’s of times faster compiled to machine code with Gambit-C, Bigloo, or Chez Scheme compilers with optimizations and unsafe modes turned on.

14. Willy Good said

Note that much of the time used for the codepad interpreter is in garbage collecting (the codepad collector must be pretty poor); for the 64-bit Petite Chez Scheme, the output is as follows (note that the main difference is the amount of time spent in garbage collecting, with Chez using very little):

```> (display (time (countemto limit))) (newline)
(time (countemto 100000000000000000000000000000000000))
19 collections
312 ms elapsed cpu time, including 16 ms collecting
319 ms elapsed real time, including 6 ms collecting
158733760 bytes allocated, including 159005104 bytes reclaimed
27608
```
15. Willy Good said

The Scheme listing didn’t come out correctly the first time, although it is correct at the codepage link: http://codepad.org/DUIJHxBA.

The code listing should be as follows:

```(define (s357)
(define (merge x y)
(let ((xv (car x)) (yv (car y)))
(cond ((< xv yv) (cons xv (lambda () (merge ((cdr x)) y))))
((> xv yv) (cons yv (lambda () (merge x ((cdr y))))))
(else (cons xv (lambda () (merge ((cdr x)) y)))))))
(define (smult s m) (cons (* (car s) m) (lambda () (smult ((cdr s)) m))))
(define (s3 n) (cons n (lambda () (s3 (* n 3)))))
(define (s35) (cons 1 (lambda () (merge (s3 3) (smult (s35) 5)))))
(let ((s35r (cdr (s35)))) (cons 1 (lambda () (merge (s35r) (smult (s357) 7))))))

(define limit 100000000000000000000000000000000000)
(define (countemto lmt)(do ((nxt (s357) ((cdr nxt))) (cnt 0 (+ cnt 1))) ((> (car nxt) lmt) cnt)))
(display (time (countemto limit))) (newline)
```
16. Will Ness said

Another merging scheme is possible, so there are no duplicates, as discussed at http://stackoverflow.com/q/12480291/849891. In Haskell:

h357 = 1 : q3 — length . takeWhile ( 203
where
q7 = 7 : map (* 7) q7
q5 = 5 : merge q7 (map (* 5) q5)
q3 = 3 : merge q5 (map (* 3) q3)

It can also be written as a one-liner, as

1:foldr (\n s->fix ((n:) . merge s . map (n*))) [] [3,5,7]

Due to Rosettacode’s user “Ledrug”.

17. Will_Ness said

Sorry for messing up the markup. I should have also read through to the end, a similar scheme was already proposed above.

```h357 = 1 : q3      -- length \$ takeWhile (< 10^6) h357  ==>  203
where
q7 = 7 : map (* 7) q7
q5 = 5 : merge q7 (map (* 5) q5)
q3 = 3 : merge q5 (map (* 3) q3)
```

It can also be written as a one-liner, as

```1:foldr (\n s->fix ((n:) . merge s . map (n*))) [] [3,5,7]
```

Due to Rosettacode’s user “Ledrug”.

18. Willy Good said

Great contribution by @Will_Ness as to the significance of these solutions!

As per whatever @Mike based the Python solution on, we were working toward the Daniel Fischer solution from the stackoverflow link.

The “one-liner” (well, not quite as it still requires the “merge” clause) Haskell solution isn’t possible in Scheme (or Python) as there is no built-in lazy lists and therefore no fixpoint recursion; moreover, reversing the order from 3/5/7 to 7/5/3 doesn’t have any benefit and in fact makes the current Scheme (and Python) code slower for larger ranges due to not taking as much advantage of the simplified sequence for the most often repeated ‘3’ series: Also, using a direct sequence multiply function instead of a map applied to a (lazy) list is more efficient being more direct. Finally, @Mike’s use of Python iterator sequences or my use of Co-Inductive Streams (CIS’s) rather than some sort of Haskell-like lazy lists (Streams as per Schemes SRFI-41, authored by @praxis) limits the speed of our current implementations due to having to recalculate previous values of the sequence due to no memoization being applied; however, the speed of the operations and less demands on garbage collection tends to cancel the theoretical advantage out for these languages.

In order to try to take full advantage of the Daniel Fischer discovery and the constant factor advantage, a memoized lazy stream is needed, which is easy to implement from the CIS code by just changing the lambda thunks to use the R5rS delay/force primitives (for this limited purpose, there is no need to use the full SRFI-18 implementation). The code which is more or less equivalent to @Will_Ness’s Haskell code then is as follows:

```(define (s357)
(define (merge x y)
(let ((xv (car x)) (yv (car y)))
(if (<= xv yv) (cons xv (delay (merge (force (cdr x)) y)))
(cons yv (delay (merge x (force (cdr y))))))))
(define (smult m s) (cons (* m (car s)) (delay (smult m (force (cdr s)))))) ;; equiv to map (* m)
(define (s7 n) (cons n (delay (s7 (* n 7)))))
(define (s57) (cons 1 (delay (merge (s7 7) (smult 5 (s57))))))
(let ((s57r (cdr (s57)))) (cons 1 (delay (merge (force s57r) (smult 3 (s357)))))))

;;; test...
(define limit 100000000000000000000000000000000000)
(define (countemto lmt)(do ((nxt (s357) (force (cdr nxt))) (cnt 0 (+ cnt 1))) ((> (car nxt) lmt) cnt)))
(display (time (countemto limit))) (newline)
```

which produces the best time with the Gambit-C compiler using optimized and unsafe code:

```(time (countemto limit))
698 ms real time
702 ms cpu time (702 user, 0 system)
296 collections accounting for 184 ms real time (140 user, 0 system)
326143648 bytes allocated
no minor faults
no major faults
27608
```

The above code is slower than the previous code for most Scheme implementations and slower than compiled Haskell due to not optimizing the lazy stream operations and the greater demands on (likely less efficient) garbage collection; as well, the reversing of the prime processing actually takes longer due to increased merge operations for the above reasons.

Live and learn!

19. Xin said

This is my implementation in Java.
Is this what you mean?

public class DivisibleBy357Only
{
private static final int MAX_NUM = 1000000;

public static Set findNumDivisibleBy357Only()
{
Set divisibleNum = new HashSet();

Integer curQueueNum, product;
int[] baseNum = {3, 5, 7};

while( (curQueueNum = ququeNum.poll()) != null)
{
for(int i=0; i < baseNum.length; i++)
{
product = curQueueNum * baseNum[i];
if( product < MAX_NUM )
{
}
}
}
return divisibleNum;
}
}

20. @Xin: Your Java program does not produce the 357factors in order, meaning if that were required you would have to sort it using a TreeSet rather than a HashSet at a cost in performance for larger ranges (O(log n) per entry rather than O(1))…

21. matthew said

I couldn’t resist a C++ version of the lazy list merging solution with 3 lists, using STL deques:

```#include <deque>
#include <stdio.h>
#include <stdlib.h>

using std::deque;

int main(int argc, char *argv[])
{
int limit = strtol(argv,NULL,0);
deque<long> q3,q5,q7;
q3.push_back(3); q5.push_back(5); q7.push_back(7);
int count = 0;
while (q3.back() < limit) {
count++;
if (3*q3.front() < q5.back()) {
q3.push_back(3*q3.front());
q3.pop_front();
continue;
}
q3.push_back(q5.back());
if (5*q5.front() < q7.back()) {
q5.push_back(5*q5.front());
q5.pop_front();
continue;
}
q5.push_back(q7.back());
q7.push_back(7*q7.front());
q7.pop_front();
}
printf("%d\n", count+1);
}
```

Perhaps not as elegant as the Haskell solution and certainly not a one liner. Perf tells me it take 1.8M cycles to count the 203 numbers less that 1000000.

22. brooknovak said

Here is a recursive solution in c++. May stack frame depth will only be log 3 1000000 = 12 stack frames.

```#include <iostream>
#include <cmath>
#include <set>

void recurse357(double exp3, double exp5, double exp7, std::set<double>& numbers) {

double sum = pow(3.0, exp3) * pow(5.0, exp5) * pow(7.0, exp7);

if(sum > 1000000)
return;

numbers.insert(sum);

recurse357(exp3 + 1.0, exp5, exp7, numbers);
recurse357(exp3, exp5 + 1.0, exp7, numbers);
recurse357(exp3, exp5, exp7 + 1.0, numbers);
}

int main(int argc, const char * argv[]) {
std::set<double> numbers;
recurse357(0.0, 0.0, 0.0, numbers);
std::cout << numbers.size();
return 0;
}
```
23. matthew said

@brooknovak: Nice solution – we don’t need to fully recalculate the sum each time though (and people might worry about floating point equality) and we don’t need to recurse for numbers we have seen before, so (with C++11 for auto) we can just do:

```#include <iostream>
#include <cmath>
#include <set>

typedef uint64_t T;
void recurse357(T sum, std::set<T>& numbers) {
if(sum > 1000000)
return;

auto res = numbers.insert(sum);
if (res.second) {
recurse357(3*sum, numbers);
recurse357(5*sum, numbers);
recurse357(7*sum, numbers);
}
}

int main(int argc, const char * argv[]) {
std::set<T> numbers;
recurse357(1, numbers);
std::cout << numbers.size() << "\n";
return 0;
}
```
24. matthew said

And a variant on the generating all triples approach, with no nested loops. Numbers not generated in order but that doesn’t matter if we are just counting them:

```#include <iostream>
#include <stdint.h>
#include <stdlib.h>

typedef uint64_t T;
int main(int argc, char *argv[])
{
T limit = strtoul(argv,NULL,0);
int count = 0;
T t3 = 1, t5 = 1, t7 = 1, n = 1;
while(true) {
count++;
t3 *= 3; n *= 3;
if (n >= limit) {
t3 = 1; t5 *= 5; n = t5*t7;
}
if (n >= limit) {
t5 = 1; t7 *= 7; n = t7;
}
if (n >= limit) {
break;
}
}
std::cout << count << "\n";
}
```
25. matthew said

It just occurred to me that t3 serves no purpose in that code and can be removed.

26. matthew said

And here is a generalization:

```#include <iostream>
#include <stdint.h>
#include <stdlib.h>

typedef uint64_t T;

// Generate (implicitly) all tuples t..t[n-1], where
// a[i] = product{ m[j]^t[j] | i <= j < n } with a < limit
static inline bool inc357(T limit, T a[], T m[], int n)
{
for (int i = 0; i < n; i++) {
a[i] *= m[i];
if (a[i] < limit) {
for (int j = 0; j < i; j++) {
a[j] = a[i];
}
return true;
}
}
return false;
}

int main(int argc, char *argv[])
{
T limit = strtoul(argv,NULL,0);
int count = 0;
T a[] = {1,1,1};
T m[] = {3,5,7};
do {
count++;
std::cout << a << "\n";
} while (inc357(limit,a,m,3));
std::cerr << count << "\n";
}
```
27. I keep coming back to this exercise wondering why Haskell is tens of times faster than any of the other functional languages and algorithms used here so did some investigating and have come up with the following:

1. It has nothing to do with its conciseness (although it definitely is that) as it is just as fast when the algorithm is written in the same form as Scheme (as I showed in a couple of examples in the comments above) or in F# (not presented here – using sequences or better Co Inductive Streams, as the results were the same).

2. I ran the examples under Haskell as @Will_Ness represented them (as he obtained them from his sources) and they are indeed about 60 times faster than as compiled under the best implementation of Scheme that generates efficient Cee code that I have at my disposal (Bigloo or Gambit-C) or F#, which is about the same speed as compiled Scheme. I think that the Haskell code (using the LLVM back end) is faster for two reasons, as follows:

a) Haskell uses that it is a pure hygienic functional language with generally nothing but immutable state to tune its custom garbage collector for very high efficiency as compared to the (Boehm) garbage collector used by Bigloo or even the custom garbage collector used by Gambit-C since Scheme does not enforce purity as Haskell does (most programmer use this in lots of “set!” procedures to modify values and variables), and

b) the GHC Haskell compiler (to some extent in combination with the LLVM optimization process) is extremely efficient at optimizing the output machine code as to eliminating procedure/function calls and converting them to simple loops and inline code. The Scheme compilers (and the F# compilation process) do some optimization along these lines in determining code that can be inlined and in tail call optimizations as to producing loops, but aren’t very efficient when it comes to iteration (Scheme doesn’t have built in iterators other than lists, which are not lazy and therefore not true iterators) and F# has lazy sequences which are very poorly optimized, using many many nested procedure calls to implement. In particular, Haskell’s lazy lists are implemented in such a way that the compiler can turn the merge function into a simple comparison and a change of link destination rather than a procedure call to a “thunk” (zero argument function) which in turn calls another function with its closure parameters, at 10’s of CPU clock cycles per function call.

In short, it appears Haskell can’t be beat in performance nor in conciseness for this sort of reasonably complex recursive list processing due to its pure functional nature, and simplistic languages/compilers such as F#/Scheme can’t touch it unless their compilers invested the years of research required in garbage collection optimizations, strictness analysis, and pure functional optimizations.

BTW, while Daniel Fischer’s optimizations to reduce redundant work does cause the large constant factor in better performance for the algorithm no matter the implementation language, changing the order of operations doesn’t have that large an effect for these 3/5/7 combination (perhaps it has more for the 2/3/5 combination he tested), and @Will_Ness’s “one-liner” is actually slightly faster when written as follows (written as a function so it will reevaluate when run under the REPL):

```hamm :: () -> [Integer]
hamm() = 1 : foldl  (\s n -> fix (merge s . (n:) . map (n*))) [] [7,5,3]
where merge [] b = b
merge a [] = a
merge a@(x:xs) b@(y:ys) = if x < y then x : merge xs b
else y : merge a ys
```
28. The GHC Haskell compiler is actually generating similar machine code as do the latest C++11 codes from @matthew except that it doesn’t use mutable state variables and thus there is considerable garbage collection activity taking slightly longer than the actual computation time; that said, the GHC Haskell garbage collector is extremely optimized for there being no mutable state and thus is more efficient than garbage collectors/automatic reference counting that must deal with that so is faster than C++11’s automatic safe_pointer handling if that were used.

29. There is also that the multiple precision (bitnum) numeric library Haskell uses is written in Haskell and thus much faster than the (Cee) GMP library used by most Scheme imlementations. The C++ code others have supplied for fixed precision is likely faster, but may well be slower if written for big numbers, as in general the routines do not eliminate repeating occurrences such as 3 time 5 and 5 times 3 also occurring in many combinations of factors, whereas the @Will_Ness supplied references and code as well as the @Mike Python code and my Scheme codes eliminate redundancies; for this reason the Haskell code may well be faster than Cee in spite of using immutable state and thus requiring the garbage collections.

30. matthew said

Interesting analysis, thanks. Some impressive optimizations by the Haskell compiler.

31. @matthew, yes, Haskell is fast and often as fast or faster than C, especially when using the LLVM backend. In this case, the algoirthm is well suited to lazy list processing, which in the merge operation removes redundant calculaatons and leaves the output sorting in ascending order for free in the same operations. However, the whole think also requires the memoization of lazy list processing, which I missed in my earlier posts; the problem with those (Scheme) posts is that the internal lists are used as procedures, which means that when they are used recursively as required by the last two streams, they require that all calculations must be repeated, meaning that instead of one comparison and just over one multiplication per output number, there are about forty times that many.

The code uses a stream/lazy list implementation based on pairs and the R6RS delay/force promise handling procedures, with internal recursive procedures changes to stream variables so that they don’t continuously get re-calculated.

The following code finds the first 10,000,000 (ten million) numbers in the sequence in about 7.6 seconds including about 2.1 seconds for garbage collection using compiled Gambit-C with maximum optimizations turned on, which is only a little over twice as slow as the best optimized Haskell code:

```(define (seq357)
(define (merge x y)
(let ((xv (car x)) (yv (car y)))
(if (<= xv yv) (cons xv (delay (merge (force (cdr x)) y)))
(cons yv (delay (merge x (force (cdr y))))))))
(define (smult m s) (cons (* m (car s)) (delay (smult m (force (cdr s)))))) ;; equiv to map (* m)
(define (s7 n) (cons n (delay (s7 (* n 7)))))
(define s57 (cons 1 (delay (merge (s7 7) (smult 5 s57)))))
(define s357 (cons 1 (delay (merge (force (cdr s57)) (smult 3 s357)))))
s357)

(define limit 10000000)
(define (countemto lmt)(do ((nxt (seq357) (force (cdr nxt))) (cnt 1 (+ cnt 1))) ((>= cnt lmt) (car nxt))))
(display (time (countemto limit))) (newline)
```

The above code interpreted under Petite Chez Scheme (as praxis uses) calculates the sequence for a million numbers in about 2.8 seconds with about 2.0 seconds used for garbage collection and the sequence up to a googol in about 1.8 seconds with about 1.0 seconds used for garbage collection.

It calculates the first one million numbers in the sequence in about 2.2 seconds with about 0.5 seconds used for garbage collection under codepad: http://codepad.org/W3QrjxmS.

With DrRacket using R6RS format and with all debugging turned off, it calculates the first million numbers in about 3.6 seconds with about 2.4 seconds used for garbage collection.

The Haskell code posted earlier is still faster than the fastest Scheme implementation of the above code due to its rule-based optimization specializations for lazy list by a factor of over two, but if the Haskell code is written to emulate the Scheme code using roll-your-own streams with thunks used as a delayed calculation means so that the specialized optimizations aren’t used, then the code is about the same speed as the very fastest Scheme implementation (Gambit-C).

One needs to remember that memorization if often desired to avoid repeated recursive calculations, such as required here for the second two sequences that refer to their own previous results. The recursive references to earlier values in the list mean that memory residency will be increased, but the huge increase in improved execution speed is likely worth it.