## Benford’s Law

### October 26, 2010

Benford’s Law, which was discovered by Simon Newcomb in 1881 and rediscovered by Frank Benford in 1938, states that, for many sets of numbers that arise in a scale-invariant manner, the first digit is distributed logarithmically, with the first digit 1 about 30% of the time, decreasing digit by digit until the first digit is 9 about 5% of the time. Stated mathematically, the leading digit *d* ∈ {1 … *b*-1} for a number written in base *b* will occur with probability *P _{d}* = log

_{b}(1 +

^{1}/

_{d}). Thus, in base 10, the probabilities of the first digit being the number 1, 2, … 9 are 30.1%, 17.6%, 12.5%, 9.7%, 7.9%, 6.7%, 5.8%, 5.1% and 4.6%.

Benford’s Law is counter-intuitive but arises frequently in nature. It is also frequently used in auditing. Make a list of the amounts of the checks that a bookkeeper has written in the past year; if more that 5% begin with the digit 8 or 9, the bookkeeper is likely an embezzler. More important, precinct results of the disputed Iranian elections a year ago displayed anomalous first-digit counts, suggesting vote fraud.

Recently, Shriram Krishnamurthy issued a programming challenge on the Racket mailing list, asking for smallest/tightest/cleanest/best code to calculate the first-digit percentages of a list of numbers; he also challenged readers to apply the function to data in a comma-separated values file. He didn’t give a source, but did mention that he was interested in the littoral area (in acres) of the lakes of Missesota; sample data appears on the next page.

Your first task is to write a function that calculates the first-digit percentages of a list of numbers. Your second task is to calculate the first-digit percentages of the data on the next page. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

[…] today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more […]

My Haskell solution (see http://bonsaicode.wordpress.com/2010/10/26/programming-praxis-benford%E2%80%99s-law/ for a version with comments):

I skipped straight to reading in values from a csv file and then performing the analysis. My solution in Python:

When run on the given csv file (which I named lakes.csv), this yields:

$ ./benford.py

1: 32.0%

2: 19.7%

3: 12.7%

4: 9.08%

5: 6.67%

6: 6.14%

7: 5.25%

8: 4.45%

9: 3.82%

Sorry about the messed up posting of my output!

Similar to Joe Marshall’s:

(define (first-digit n base)

;; returns first base base digit of n

(let ((base^2 (* base base)))

(cond ((< n base) n)

((< n base^2) (quotient n base))

(else

(first-digit (first-digit n base^2) base)))))

A ruby version also going straight to CSV calculation …

;; natural -> digit

(define (head-of-num n)

(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

;; list of naturals -> digit

(define (benford l)

(let ((res (make-vector 10 0)) ;; store the results

(count 0))

(map (lambda (n) ;; increment each seen digit’s counter

(let ((i (head-of-num n)))

(vector-set! res i (+ 1 (vector-ref res i))))

(set! count (+ 1 count)))

l)

(map (lambda (i) ;; divide by number of elements

(let ((val (/ (vector-ref res i) count)))

(vector-set! res i (list i val (exact->inexact val)))))

(iota 0 9))

res))

(define (test data)

(benford data))

;; I like this version more.

(define (head-of-num n)

(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

(define (benford l)

(let* ((count 0)

(res (fold-left

(lambda (n state)

(let ((i (head-of-num n)))

(vector-set! state i (+ 1 (vector-ref state i)))

(set! count (+ 1 count))

state))

(make-vector 10 0)

l)))

(map (lambda (x) (exact->inexact (/ x count))) (vector->list res))))

(define (test data)

(benford data))

My F# codes

My F# implementation:

In FORTH, However I got rid of the offending comma in the quotation for one of the lakes rather than parse csv totally correctly…

Execution: