Phone Numbers And Prime Factors
June 24, 2016
Number theorists know the function that computes the number of distinct prime factors of n as ω(n); it is calculated by factoring n and scanning through the factors in ascending order:
(define (last xs) (if (or (null? xs) (null? (cdr xs))) xs (last (cdr xs))))
(define (cycle . xs) (set-cdr! (last xs) xs) xs)
(define wheel (cons 1 (cons 2 (cons 2 (cycle 4 2 4 2 4 6 2 6)))))
(define (factors n) (let loop ((n n) (f 2) (w wheel) (fs (list))) (if (< n (* f f)) (reverse (cons n fs)) (if (zero? (modulo n f)) (loop (/ n f) f w (cons f fs)) (loop n (+ f (car w)) (cdr w) fs)))))
(define (omega n) (let loop ((fs (factors n)) (prev 0) (count 0)) (if (null? fs) count (if (= (car fs) prev) (loop (cdr fs) prev count) (loop (cdr fs) (car fs) (+ count 1))))))
We used trial division with a 2,3,5-wheel to perform the factorization, which is an appropriate factoring algorithm for 10-digit numbers. Next, we need to be able to generate random telephone numbers; this is Knuth’s 64-bit linear congruential algorithm from a previous exercise:
(define rand (let* ((a 6364136223846793005) (c 1442695040888963407) (m 18446744073709551616) (seed (time-second (current-time)))) (lambda args (when (pair? args) (set! seed (modulo (car args) m))) (set! seed (modulo (+ (* a seed) c) m)) (/ seed m))))
(define randint (case-lambda ((n) (floor (* (rand) n))) ((first past) (+ (floor (* (rand) (- past first))) first)))
(define (rand-phone) (randint #e1e9 #e1e10))
Finally, with help from the uniq-c
function in the Standard Prelude, we write a function that generates n telephone numbers and accumulates a distribution of the number of distinct prime factors:
(define (phones n) (do ((n n (- n 1)) (xs (list) (cons (omega (rand-phone)) xs))) ((zero? n) (uniq-c = (sort < xs)))))
And here’s the result:
> (phones 1000) ((1 . 44) (2 . 184) (3 . 340) (4 . 254) (5 . 129) (6 . 41) (7 . 7) (8 . 1))
As Cook calculated, the most common result is 3 distinct prime factors. However, Cook calculated 27% of the time the number would have more than 5 distinct prime factors, but we found only 49 out of a thousand, rather than the 270 Cook calculated, which means that either Cook got the math wrong (unlikely) or I got my simulation wrong (more likely).
You can run the program at http://ideone.com/cOjaZO.
Maybe N = 10^11 is a bit small to apply the Erdos-Kac heuristic too strictly. log(log(10^11)) is only about 3.2, and it’s square root is only about 1.8. That means omega(n) = 1 corresponds to a z-value of about (1-3.2)/1.8 – say roughly -1.25 – so our sample will never include anything with a standard deviation of less than -1.25. That leaves 10% of the expected distribution of omega(n) out of the bounds of attainable values of omega(n). So it seems reasonable (to me) to expect noticeable discrepancy between the heuristic’s predictions in a large enough sample at this N.