Benford’s Law
October 26, 2010
When Shriram first posted this exercise, my solution used uniq-c, compose and digits from the Standard Prelude, plus the non-standard but very useful printf
function of Chez Scheme:
(define (benford xs)
(for-each
(lambda (x) (printf "~a ~f~%" (car x) (/ (cdr x) (length xs))))
(uniq-c = (sort < (map (compose car digits) xs)))))
Shriram’s comment was “My word — that’s lovely.” Later in the thread, Joe Marshall presented a Lisp function to compute the first digit of a number, to which I say “My word — that’s lovely:”
(defun leftmost-digit (n base)
(if (> base n)
n
(do* ((i base next)
(next (* base base) (* i i)))
((> next n) (leftmost-digit (floor n i) base)))))
Here is my translation of that function to Scheme; making base
default to 10 makes the function a bit longer than the original:
(define (first-digit n . base)
(let* ((b (if (null? base) 10 (car base)))
(b2 (* b b)))
(let loop ((n n) (i b) (k b2))
(cond ((< n b) n)
((< n k)
(loop (quotient n i) b b2))
(else (loop n k (* i i)))))))
Shriram’s second challenge was to extract the first-digit proportions from a CSV file, such as the previous page. The text-file database functions of the two previous exercises make that simple. First, we write a predicate to keep useful records and discard the various headers, “N/A” values, and other useless records:
(define (keep? xs) (string->number (list-ref xs 3)))
Then we use read-csv-record
, filter-port
and map-reduce-port
to calculate the first-digit percentages:
> (let* ((fds (with-input-from-file "mn-lakes.csv"
(lambda ()
(map-reduce-port
(filter-port read-csv-record keep?)
(lambda (x)
(values
(first-digit (floor (string->number (list-ref x 3))))
1))
(lambda (k v1 v2) (+ v1 v2))
<))))
(fds-count (apply + (map cdr fds))))
(for-each (lambda (x)
(printf "~a ~f~%"
(car x)
(/ (cdr x) fds-count 0.01)))
fds))
1 32.05699020480855
2 19.768477292965272
3 12.733748886910062
4 9.082813891362422
5 6.678539626001781
6 6.144256455921639
7 5.253784505788068
8 4.452359750667854
9 3.8290293855743545
During testing, I wrote the following awk program to compare my results:
awk -F, '
$4>0 { c[substr($4,1,1)]++; t++ }
END { for (i=1; i<10; i++) print i, c[i]/t*100 }
' mn-lakes.csv
Running that program produced results that differed from the Scheme program; one record for which the Scheme program calculated a first digit of 1 was calculated as a first digit of 4 by the awk program. The problem was the record
"Gull, Upper","Nisswa",422,154,54,7.5
The simplistic awk program saw “Gull, Upper” as two fields when it is, in fact, a single field, proving again that it is hard to parse CSV files.
You can run the program at http://programmingpraxis.codepad.org/7l3dRSAn.
[…] today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more […]
My Haskell solution (see http://bonsaicode.wordpress.com/2010/10/26/programming-praxis-benford%E2%80%99s-law/ for a version with comments):
I skipped straight to reading in values from a csv file and then performing the analysis. My solution in Python:
When run on the given csv file (which I named lakes.csv), this yields:
$ ./benford.py
1: 32.0%
2: 19.7%
3: 12.7%
4: 9.08%
5: 6.67%
6: 6.14%
7: 5.25%
8: 4.45%
9: 3.82%
Sorry about the messed up posting of my output!
Similar to Joe Marshall’s:
(define (first-digit n base)
;; returns first base base digit of n
(let ((base^2 (* base base)))
(cond ((< n base) n)
((< n base^2) (quotient n base))
(else
(first-digit (first-digit n base^2) base)))))
A ruby version also going straight to CSV calculation …
;; natural -> digit
(define (head-of-num n)
(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))
;; list of naturals -> digit
(define (benford l)
(let ((res (make-vector 10 0)) ;; store the results
(count 0))
(map (lambda (n) ;; increment each seen digit’s counter
(let ((i (head-of-num n)))
(vector-set! res i (+ 1 (vector-ref res i))))
(set! count (+ 1 count)))
l)
(map (lambda (i) ;; divide by number of elements
(let ((val (/ (vector-ref res i) count)))
(vector-set! res i (list i val (exact->inexact val)))))
(iota 0 9))
res))
(define (test data)
(benford data))
;; I like this version more.
(define (head-of-num n)
(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))
(define (benford l)
(let* ((count 0)
(res (fold-left
(lambda (n state)
(let ((i (head-of-num n)))
(vector-set! state i (+ 1 (vector-ref state i)))
(set! count (+ 1 count))
state))
(make-vector 10 0)
l)))
(map (lambda (x) (exact->inexact (/ x count))) (vector->list res))))
(define (test data)
(benford data))
My F# codes
My F# implementation:
In FORTH, However I got rid of the offending comma in the quotation for one of the lakes rather than parse csv totally correctly…
Execution:
[…] John D. Cook, a programmer who writes about mathematics (he would probably describe himself as a mathematician who writes about programming) recently wrote about the distribution of the leading digits of the powers of 2, observing that they follow Benford’s Law, which we studied in a previous exercise. […]