Benford’s Law
October 26, 2010
Benford’s Law, which was discovered by Simon Newcomb in 1881 and rediscovered by Frank Benford in 1938, states that, for many sets of numbers that arise in a scale-invariant manner, the first digit is distributed logarithmically, with the first digit 1 about 30% of the time, decreasing digit by digit until the first digit is 9 about 5% of the time. Stated mathematically, the leading digit d ∈ {1 … b-1} for a number written in base b will occur with probability Pd = logb(1 + 1/d). Thus, in base 10, the probabilities of the first digit being the number 1, 2, … 9 are 30.1%, 17.6%, 12.5%, 9.7%, 7.9%, 6.7%, 5.8%, 5.1% and 4.6%.
Benford’s Law is counter-intuitive but arises frequently in nature. It is also frequently used in auditing. Make a list of the amounts of the checks that a bookkeeper has written in the past year; if more that 5% begin with the digit 8 or 9, the bookkeeper is likely an embezzler. More important, precinct results of the disputed Iranian elections a year ago displayed anomalous first-digit counts, suggesting vote fraud.
Recently, Shriram Krishnamurthy issued a programming challenge on the Racket mailing list, asking for smallest/tightest/cleanest/best code to calculate the first-digit percentages of a list of numbers; he also challenged readers to apply the function to data in a comma-separated values file. He didn’t give a source, but did mention that he was interested in the littoral area (in acres) of the lakes of Missesota; sample data appears on the next page.
Your first task is to write a function that calculates the first-digit percentages of a list of numbers. Your second task is to calculate the first-digit percentages of the data on the next page. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
[…] today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more […]
My Haskell solution (see http://bonsaicode.wordpress.com/2010/10/26/programming-praxis-benford%E2%80%99s-law/ for a version with comments):
I skipped straight to reading in values from a csv file and then performing the analysis. My solution in Python:
When run on the given csv file (which I named lakes.csv), this yields:
$ ./benford.py
1: 32.0%
2: 19.7%
3: 12.7%
4: 9.08%
5: 6.67%
6: 6.14%
7: 5.25%
8: 4.45%
9: 3.82%
Sorry about the messed up posting of my output!
Similar to Joe Marshall’s:
(define (first-digit n base)
;; returns first base base digit of n
(let ((base^2 (* base base)))
(cond ((< n base) n)
((< n base^2) (quotient n base))
(else
(first-digit (first-digit n base^2) base)))))
A ruby version also going straight to CSV calculation …
;; natural -> digit
(define (head-of-num n)
(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))
;; list of naturals -> digit
(define (benford l)
(let ((res (make-vector 10 0)) ;; store the results
(count 0))
(map (lambda (n) ;; increment each seen digit’s counter
(let ((i (head-of-num n)))
(vector-set! res i (+ 1 (vector-ref res i))))
(set! count (+ 1 count)))
l)
(map (lambda (i) ;; divide by number of elements
(let ((val (/ (vector-ref res i) count)))
(vector-set! res i (list i val (exact->inexact val)))))
(iota 0 9))
res))
(define (test data)
(benford data))
;; I like this version more.
(define (head-of-num n)
(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))
(define (benford l)
(let* ((count 0)
(res (fold-left
(lambda (n state)
(let ((i (head-of-num n)))
(vector-set! state i (+ 1 (vector-ref state i)))
(set! count (+ 1 count))
state))
(make-vector 10 0)
l)))
(map (lambda (x) (exact->inexact (/ x count))) (vector->list res))))
(define (test data)
(benford data))
My F# codes
My F# implementation:
In FORTH, However I got rid of the offending comma in the quotation for one of the lakes rather than parse csv totally correctly…
Execution:
[…] John D. Cook, a programmer who writes about mathematics (he would probably describe himself as a mathematician who writes about programming) recently wrote about the distribution of the leading digits of the powers of 2, observing that they follow Benford’s Law, which we studied in a previous exercise. […]