## Benford’s Law

### October 26, 2010

Benford’s Law, which was discovered by Simon Newcomb in 1881 and rediscovered by Frank Benford in 1938, states that, for many sets of numbers that arise in a scale-invariant manner, the first digit is distributed logarithmically, with the first digit 1 about 30% of the time, decreasing digit by digit until the first digit is 9 about 5% of the time. Stated mathematically, the leading digit d ∈ {1 … b-1} for a number written in base b will occur with probability Pd = logb(1 + 1/d). Thus, in base 10, the probabilities of the first digit being the number 1, 2, … 9 are 30.1%, 17.6%, 12.5%, 9.7%, 7.9%, 6.7%, 5.8%, 5.1% and 4.6%.

Benford’s Law is counter-intuitive but arises frequently in nature. It is also frequently used in auditing. Make a list of the amounts of the checks that a bookkeeper has written in the past year; if more that 5% begin with the digit 8 or 9, the bookkeeper is likely an embezzler. More important, precinct results of the disputed Iranian elections a year ago displayed anomalous first-digit counts, suggesting vote fraud.

Recently, Shriram Krishnamurthy issued a programming challenge on the Racket mailing list, asking for smallest/tightest/cleanest/best code to calculate the first-digit percentages of a list of numbers; he also challenged readers to apply the function to data in a comma-separated values file. He didn’t give a source, but did mention that he was interested in the littoral area (in acres) of the lakes of Missesota; sample data appears on the next page.

Your first task is to write a function that calculates the first-digit percentages of a list of numbers. Your second task is to calculate the first-digit percentages of the data on the next page. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3

### 11 Responses to “Benford’s Law”

1. [...] today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more [...]

2. My Haskell solution (see http://bonsaicode.wordpress.com/2010/10/26/programming-praxis-benford%E2%80%99s-law/ for a version with comments):

```import Data.List
import Text.Printf

firstDigits :: [Float] -> [(Char, Double)]
firstDigits xs = map (\ds -> (head ds, 100 * toEnum (length ds) /
toEnum (length xs))) . group . sort \$ map (head . show) xs

shriram :: [[String]] -> [(Char, Double)]
shriram xs = firstDigits [n | [(n,_)] <- map (reads . (!! 3)) xs]
```
3. Graham said

I skipped straight to reading in values from a csv file and then performing the analysis. My solution in Python:

```#!/usr/bin/env python

import csv

def benford(csv_file, numerical_position):
"""
Calculates the percentages of first digit occurrence in numerical_position
in the csv_file in question.
"""
d = dict(zip(range(1,10), [0] * 9))
for row in r:
try:
d[int(row[numerical_position][0])] += 1
except:
pass
total = float(sum(d.values()))
for k in d:
print "%s:\t%.4s%%" % (k, 100 * d[k] / total)
return

if __name__ == '__main__':
benford('lakes.csv', 3)
```

When run on the given csv file (which I named lakes.csv), this yields:

\$ ./benford.py
1: 32.0%
2: 19.7%
3: 12.7%
4: 9.08%
5: 6.67%
6: 6.14%
7: 5.25%
8: 4.45%
9: 3.82%

4. Graham said

Sorry about the messed up posting of my output!

5. Gambiteer said

Similar to Joe Marshall’s:

(define (first-digit n base)
;; returns first base base digit of n
(let ((base^2 (* base base)))
(cond ((< n base) n)
((< n base^2) (quotient n base))
(else
(first-digit (first-digit n base^2) base)))))

6. slabounty said

A ruby version also going straight to CSV calculation …

```require 'csv'

def benford(csv_file, position)
first_digits = Array.new(10, 0)
CSV.foreach(csv_file) do |row|
digit = row[position].to_s[0]
first_digits[digit.to_i] += 1  if digit =~ /[0-9]/
end
first_digits
end

first_digits = benford("lakes.csv", 3)
total = first_digits.inject(0) { |sum, v| sum + v }
first_digits.each_with_index { |v, i| puts "Percentage for #{i} #{ (v.to_f / total.to_f) * 100.0} " if i != 0 }
```
7. Axio said

;; natural -> digit
(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

;; list of naturals -> digit
(define (benford l)
(let ((res (make-vector 10 0)) ;; store the results
(count 0))
(map (lambda (n) ;; increment each seen digit’s counter
(let ((i (head-of-num n)))
(vector-set! res i (+ 1 (vector-ref res i))))
(set! count (+ 1 count)))
l)
(map (lambda (i) ;; divide by number of elements
(let ((val (/ (vector-ref res i) count)))
(vector-set! res i (list i val (exact->inexact val)))))
(iota 0 9))
res))

(define (test data)
(benford data))

8. Axio said

;; I like this version more.
(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

(define (benford l)
(let* ((count 0)
(res (fold-left
(lambda (n state)
(let ((i (head-of-num n)))
(vector-set! state i (+ 1 (vector-ref state i)))
(set! count (+ 1 count))
state))
(make-vector 10 0)
l)))
(map (lambda (x) (exact->inexact (/ x count))) (vector->list res))))

(define (test data)
(benford data))

9. Khanh Nguyen said

My F# codes

```//partition list of single digits into bins
let split l =
let rec split_aux ll i =
if (i = 9) then
[ll]
else
let tt = List.partition (fun x -> x = i) ll
(fst tt) :: (split_aux (snd tt) (i+1))
split_aux l 0

let rec first_digit x =
if x < 10 then x else first_digit (x / 10)

let benford l =
let first_digit_list = List.map first_digit l
let digits_counts = List.map List.length (split first_digit_list)
//compute the percentage
List.map (fun x -> (float x) / (float) (List.sum digits_counts)) digits_counts

```
10. KNguyen said

My F# implementation:

```//partition list of single digits into bins
let split l =
let rec split_aux ll i =
if (i = 9) then
[ll]
else
let tt = List.partition (fun x -> x = i) ll
(fst tt) :: (split_aux (snd tt) (i+1))
split_aux l 0

let rec first_digit x =
if x < 10 then x else first_digit (x / 10)

let benford l =
let first_digit_list = List.map first_digit l
let digits_counts = List.map List.length (split first_digit_list)
//compute the percentage
List.map (fun x -> (float x) / (float) (List.sum digits_counts)) digits_counts
```
11. David said

In FORTH, However I got rid of the offending comma in the quotation for one of the lakes rather than parse csv totally correctly…

```{ -------------------- split -------------------- }
{ taken from string library --------------  }

2dup = IF  2drop  0:0 2swap     \ at end, set remainder to ""
ELSE  nip 2 pick - >r
2dup r@ 1+ /string   \ adjust remainder
2swap drop r>        \ adjust result
THEN ;

: split  ( adr c delim -- adr1 c1 adr2 c2 )
>r  \ save delimiter
2dup +  2 pick  \ ( adr c -- adr c end-adr adr )
BEGIN
2dup > IF  dup c@ r@ <>  ELSE  false  THEN
WHILE
1+
REPEAT r> drop

{ --------------- Benford -------------- }

variable sample-size
create digits 10 cells allot

: nth-field ( adr c delim n -- adr c )
swap  locals| delim |
0 ?DO  delim split 2drop  LOOP
delim split 2nip ;

: get-digit ( addr count -- digit )
0> IF
c@ dup  [char] 1  [ char 9 1+ ] literal  within  IF
[char] 0 -
ELSE
drop -1
THEN
ELSE
drop -1
THEN ;

: count-digit ( addr count -- )
get-digit dup 0> IF
cells digits +  1 swap +!
1 sample-size +!
ELSE
drop
THEN ;

: init-benford ( -- )
digits 10 cells 0 fill
0 sample-size ! ;

: import-field ( n fd -- )
locals| input n |
BEGIN pad 1024 input read-line throw WHILE
pad swap [char] , n nth-field count-digit
REPEAT drop ;

: rounded ( n -- n )
10 /mod  swap 5 + 10 / + ;

: .percent  ( n -- )
10000 sample-size @ */ rounded
0 <# [char] % hold # [char] . hold #s #> type space ;

: report
10 1 DO
cr i 2 .r  ." : "  i cells digits + @ .percent
LOOP ;

: benford  ( field -- )
init-benford
bl word count  r/o open-file throw
2dup import-field
close-file throw drop
report ;
```

Execution:

```3 benford lakestats-mn.csv
1: 32.1%
2: 19.8%
3: 12.7%
4: 9.1%
5: 6.7%
6: 6.1%
7: 5.3%
8: 4.5%
9: 3.8%  ok
```