Benford’s Law
October 26, 2010
Benford’s Law, which was discovered by Simon Newcomb in 1881 and rediscovered by Frank Benford in 1938, states that, for many sets of numbers that arise in a scale-invariant manner, the first digit is distributed logarithmically, with the first digit 1 about 30% of the time, decreasing digit by digit until the first digit is 9 about 5% of the time. Stated mathematically, the leading digit d ∈ {1 … b-1} for a number written in base b will occur with probability Pd = logb(1 + 1/d). Thus, in base 10, the probabilities of the first digit being the number 1, 2, … 9 are 30.1%, 17.6%, 12.5%, 9.7%, 7.9%, 6.7%, 5.8%, 5.1% and 4.6%.
Benford’s Law is counter-intuitive but arises frequently in nature. It is also frequently used in auditing. Make a list of the amounts of the checks that a bookkeeper has written in the past year; if more that 5% begin with the digit 8 or 9, the bookkeeper is likely an embezzler. More important, precinct results of the disputed Iranian elections a year ago displayed anomalous first-digit counts, suggesting vote fraud.
Recently, Shriram Krishnamurthy issued a programming challenge on the Racket mailing list, asking for smallest/tightest/cleanest/best code to calculate the first-digit percentages of a list of numbers; he also challenged readers to apply the function to data in a comma-separated values file. He didn’t give a source, but did mention that he was interested in the littoral area (in acres) of the lakes of Missesota; sample data appears on the next page.
Your first task is to write a function that calculates the first-digit percentages of a list of numbers. Your second task is to calculate the first-digit percentages of the data on the next page. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
[…] today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more […]
My Haskell solution (see http://bonsaicode.wordpress.com/2010/10/26/programming-praxis-benford%E2%80%99s-law/ for a version with comments):
import Data.List import Text.Printf firstDigits :: [Float] -> [(Char, Double)] firstDigits xs = map (\ds -> (head ds, 100 * toEnum (length ds) / toEnum (length xs))) . group . sort $ map (head . show) xs shriram :: [[String]] -> [(Char, Double)] shriram xs = firstDigits [n | [(n,_)] <- map (reads . (!! 3)) xs]I skipped straight to reading in values from a csv file and then performing the analysis. My solution in Python:
#!/usr/bin/env python import csv def benford(csv_file, numerical_position): """ Calculates the percentages of first digit occurrence in numerical_position in the csv_file in question. """ r = csv.reader(open(csv_file)) d = dict(zip(range(1,10), [0] * 9)) for row in r: try: d[int(row[numerical_position][0])] += 1 except: pass total = float(sum(d.values())) for k in d: print "%s:\t%.4s%%" % (k, 100 * d[k] / total) return if __name__ == '__main__': benford('lakes.csv', 3)When run on the given csv file (which I named lakes.csv), this yields:
$ ./benford.py
1: 32.0%
2: 19.7%
3: 12.7%
4: 9.08%
5: 6.67%
6: 6.14%
7: 5.25%
8: 4.45%
9: 3.82%
Sorry about the messed up posting of my output!
Similar to Joe Marshall’s:
(define (first-digit n base)
;; returns first base base digit of n
(let ((base^2 (* base base)))
(cond ((< n base) n)
((< n base^2) (quotient n base))
(else
(first-digit (first-digit n base^2) base)))))
A ruby version also going straight to CSV calculation …
require 'csv' def benford(csv_file, position) first_digits = Array.new(10, 0) CSV.foreach(csv_file) do |row| digit = row[position].to_s[0] first_digits[digit.to_i] += 1 if digit =~ /[0-9]/ end first_digits end first_digits = benford("lakes.csv", 3) total = first_digits.inject(0) { |sum, v| sum + v } first_digits.each_with_index { |v, i| puts "Percentage for #{i} #{ (v.to_f / total.to_f) * 100.0} " if i != 0 };; natural -> digit
(define (head-of-num n)
(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))
;; list of naturals -> digit
(define (benford l)
(let ((res (make-vector 10 0)) ;; store the results
(count 0))
(map (lambda (n) ;; increment each seen digit’s counter
(let ((i (head-of-num n)))
(vector-set! res i (+ 1 (vector-ref res i))))
(set! count (+ 1 count)))
l)
(map (lambda (i) ;; divide by number of elements
(let ((val (/ (vector-ref res i) count)))
(vector-set! res i (list i val (exact->inexact val)))))
(iota 0 9))
res))
(define (test data)
(benford data))
;; I like this version more.
(define (head-of-num n)
(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))
(define (benford l)
(let* ((count 0)
(res (fold-left
(lambda (n state)
(let ((i (head-of-num n)))
(vector-set! state i (+ 1 (vector-ref state i)))
(set! count (+ 1 count))
state))
(make-vector 10 0)
l)))
(map (lambda (x) (exact->inexact (/ x count))) (vector->list res))))
(define (test data)
(benford data))
My F# codes
//partition list of single digits into bins let split l = let rec split_aux ll i = if (i = 9) then [ll] else let tt = List.partition (fun x -> x = i) ll (fst tt) :: (split_aux (snd tt) (i+1)) split_aux l 0 let rec first_digit x = if x < 10 then x else first_digit (x / 10) let benford l = let first_digit_list = List.map first_digit l let digits_counts = List.map List.length (split first_digit_list) //compute the percentage List.map (fun x -> (float x) / (float) (List.sum digits_counts)) digits_countsMy F# implementation:
//partition list of single digits into bins let split l = let rec split_aux ll i = if (i = 9) then [ll] else let tt = List.partition (fun x -> x = i) ll (fst tt) :: (split_aux (snd tt) (i+1)) split_aux l 0 let rec first_digit x = if x < 10 then x else first_digit (x / 10) let benford l = let first_digit_list = List.map first_digit l let digits_counts = List.map List.length (split first_digit_list) //compute the percentage List.map (fun x -> (float x) / (float) (List.sum digits_counts)) digits_countsIn FORTH, However I got rid of the offending comma in the quotation for one of the lakes rather than parse csv totally correctly…
{ -------------------- split -------------------- } { taken from string library -------------- } : (adjust) ( adr1 c1 adr2 ard3 ) 2dup = IF 2drop 0:0 2swap \ at end, set remainder to "" ELSE nip 2 pick - >r 2dup r@ 1+ /string \ adjust remainder 2swap drop r> \ adjust result THEN ; : split ( adr c delim -- adr1 c1 adr2 c2 ) >r \ save delimiter 2dup + 2 pick \ ( adr c -- adr c end-adr adr ) BEGIN 2dup > IF dup c@ r@ <> ELSE false THEN WHILE 1+ REPEAT r> drop (adjust) ; { --------------- Benford -------------- } variable sample-size create digits 10 cells allot : nth-field ( adr c delim n -- adr c ) swap locals| delim | 0 ?DO delim split 2drop LOOP delim split 2nip ; : get-digit ( addr count -- digit ) 0> IF c@ dup [char] 1 [ char 9 1+ ] literal within IF [char] 0 - ELSE drop -1 THEN ELSE drop -1 THEN ; : count-digit ( addr count -- ) get-digit dup 0> IF cells digits + 1 swap +! 1 sample-size +! ELSE drop THEN ; : init-benford ( -- ) digits 10 cells 0 fill 0 sample-size ! ; : import-field ( n fd -- ) locals| input n | BEGIN pad 1024 input read-line throw WHILE pad swap [char] , n nth-field count-digit REPEAT drop ; : rounded ( n -- n ) 10 /mod swap 5 + 10 / + ; : .percent ( n -- ) 10000 sample-size @ */ rounded 0 <# [char] % hold # [char] . hold #s #> type space ; : report 10 1 DO cr i 2 .r ." : " i cells digits + @ .percent LOOP ; : benford ( field -- ) init-benford bl word count r/o open-file throw 2dup import-field close-file throw drop report ;Execution:
[…] John D. Cook, a programmer who writes about mathematics (he would probably describe himself as a mathematician who writes about programming) recently wrote about the distribution of the leading digits of the powers of 2, observing that they follow Benford’s Law, which we studied in a previous exercise. […]