Benford’s Law

October 26, 2010

Benford’s Law, which was discovered by Simon Newcomb in 1881 and rediscovered by Frank Benford in 1938, states that, for many sets of numbers that arise in a scale-invariant manner, the first digit is distributed logarithmically, with the first digit 1 about 30% of the time, decreasing digit by digit until the first digit is 9 about 5% of the time. Stated mathematically, the leading digit d ∈ {1 … b-1} for a number written in base b will occur with probability P_d = log_b(1 + ¹/_d). Thus, in base 10, the probabilities of the first digit being the number 1, 2, … 9 are 30.1%, 17.6%, 12.5%, 9.7%, 7.9%, 6.7%, 5.8%, 5.1% and 4.6%.

Benford’s Law is counter-intuitive but arises frequently in nature. It is also frequently used in auditing. Make a list of the amounts of the checks that a bookkeeper has written in the past year; if more that 5% begin with the digit 8 or 9, the bookkeeper is likely an embezzler. More important, precinct results of the disputed Iranian elections a year ago displayed anomalous first-digit counts, suggesting vote fraud.

Recently, Shriram Krishnamurthy issued a programming challenge on the Racket mailing list, asking for smallest/tightest/cleanest/best code to calculate the first-digit percentages of a list of numbers; he also challenged readers to apply the function to data in a comma-separated values file. He didn’t give a source, but did mention that he was interested in the littoral area (in acres) of the lakes of Missesota; sample data appears on the next page.

Your first task is to write a function that calculates the first-digit percentages of a list of numbers. Your second task is to calculate the first-digit percentages of the data on the next page. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

12 Comments »

12 Responses to “Benford’s Law”

Programming Praxis – Benford’s Law « Bonsai Code said
October 26, 2010 at 12:14 PM
[…] today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more […]

Remco Niemeijer said

October 26, 2010 at 12:14 PM

My Haskell solution (see http://bonsaicode.wordpress.com/2010/10/26/programming-praxis-benford%E2%80%99s-law/ for a version with comments):

import Data.List
import Text.Printf

firstDigits :: [Float] -> [(Char, Double)]
firstDigits xs = map (\ds -> (head ds, 100 * toEnum (length ds) /
    toEnum (length xs))) . group . sort $ map (head . show) xs

shriram :: [[String]] -> [(Char, Double)]
shriram xs = firstDigits [n | [(n,_)] <- map (reads . (!! 3)) xs]

Graham said

October 26, 2010 at 12:32 PM

I skipped straight to reading in values from a csv file and then performing the analysis. My solution in Python:

#!/usr/bin/env python

import csv

def benford(csv_file, numerical_position):
    """
    Calculates the percentages of first digit occurrence in numerical_position
    in the csv_file in question.
    """
    r = csv.reader(open(csv_file))
    d = dict(zip(range(1,10), [0] * 9))
    for row in r:
        try:
            d[int(row[numerical_position][0])] += 1
        except:
            pass
    total = float(sum(d.values()))
    for k in d:
        print "%s:\t%.4s%%" % (k, 100 * d[k] / total)
    return

if __name__ == '__main__':
    benford('lakes.csv', 3)

When run on the given csv file (which I named lakes.csv), this yields:

$ ./benford.py
1: 32.0%
2: 19.7%
3: 12.7%
4: 9.08%
5: 6.67%
6: 6.14%
7: 5.25%
8: 4.45%
9: 3.82%

Graham said
October 26, 2010 at 12:34 PM
Sorry about the messed up posting of my output!
Gambiteer said
October 26, 2010 at 3:22 PM
Similar to Joe Marshall’s:

(define (first-digit n base)
;; returns first base base digit of n
(let ((base^2 (* base base)))
(cond ((< n base) n)
((< n base^2) (quotient n base))
(else
(first-digit (first-digit n base^2) base)))))

slabounty said

October 26, 2010 at 7:49 PM

A ruby version also going straight to CSV calculation …

require 'csv'

def benford(csv_file, position)
    first_digits = Array.new(10, 0)
    CSV.foreach(csv_file) do |row|
        digit = row[position].to_s[0]
        first_digits[digit.to_i] += 1  if digit =~ /[0-9]/
    end
    first_digits
end

first_digits = benford("lakes.csv", 3)
total = first_digits.inject(0) { |sum, v| sum + v }
first_digits.each_with_index { |v, i| puts "Percentage for #{i} #{ (v.to_f / total.to_f) * 100.0} " if i != 0 }

Axio said
October 27, 2010 at 2:29 AM
;; natural -> digit
(define (head-of-num n)
  (let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

;; list of naturals -> digit
(define (benford l)
  (let ((res (make-vector 10 0)) ;; store the results
        (count 0))
    (map (lambda (n) ;; increment each seen digit’s counter
           (let ((i (head-of-num n)))
             (vector-set! res i (+ 1 (vector-ref res i))))
           (set! count (+ 1 count)))
         l)
    (map (lambda (i) ;; divide by number of elements
           (let ((val (/ (vector-ref res i) count)))
             (vector-set! res i (list i val (exact->inexact val)))))
         (iota 0 9))
    res))

(define (test data)
  (benford data))
Axio said
October 28, 2010 at 5:52 AM
;; I like this version more.
(define (head-of-num n)
  (let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

(define (benford l)
  (let* ((count 0)
         (res (fold-left
                (lambda (n state)
                  (let ((i (head-of-num n)))
                    (vector-set! state i (+ 1 (vector-ref state i)))
                    (set! count (+ 1 count))
                    state))
                (make-vector 10 0)
                l)))
    (map (lambda (x) (exact->inexact (/ x count))) (vector->list res))))

(define (test data)
  (benford data))

Khanh Nguyen said

October 28, 2010 at 8:44 PM

My F# codes

//partition list of single digits into bins
let split l =
    let rec split_aux ll i =
        if (i = 9) then 
            [ll]
        else
            let tt = List.partition (fun x -> x = i) ll
            (fst tt) :: (split_aux (snd tt) (i+1))
    split_aux l 0

let rec first_digit x = 
    if x < 10 then x else first_digit (x / 10)

let benford l = 
    let first_digit_list = List.map first_digit l    
    let digits_counts = List.map List.length (split first_digit_list)
    //compute the percentage
    List.map (fun x -> (float x) / (float) (List.sum digits_counts)) digits_counts

KNguyen said

October 28, 2010 at 8:46 PM

My F# implementation:

//partition list of single digits into bins
let split l =
    let rec split_aux ll i =
        if (i = 9) then 
            [ll]
        else
            let tt = List.partition (fun x -> x = i) ll
            (fst tt) :: (split_aux (snd tt) (i+1))
    split_aux l 0

let rec first_digit x = 
    if x < 10 then x else first_digit (x / 10)

let benford l = 
    let first_digit_list = List.map first_digit l    
    let digits_counts = List.map List.length (split first_digit_list)
    //compute the percentage
    List.map (fun x -> (float x) / (float) (List.sum digits_counts)) digits_counts

David said

May 19, 2012 at 6:13 PM

In FORTH, However I got rid of the offending comma in the quotation for one of the lakes rather than parse csv totally correctly…

{ -------------------- split -------------------- }
{ taken from string library --------------  }

: (adjust) ( adr1 c1 adr2 ard3 )
    2dup = IF  2drop  0:0 2swap     \ at end, set remainder to ""
         ELSE  nip 2 pick - >r
               2dup r@ 1+ /string   \ adjust remainder
               2swap drop r>        \ adjust result
         THEN ;

: split  ( adr c delim -- adr1 c1 adr2 c2 )
    >r  \ save delimiter
    2dup +  2 pick  \ ( adr c -- adr c end-adr adr )
    BEGIN
        2dup > IF  dup c@ r@ <>  ELSE  false  THEN
    WHILE
        1+
    REPEAT r> drop
    (adjust) ;

{ --------------- Benford -------------- }

variable sample-size
create digits 10 cells allot

: nth-field ( adr c delim n -- adr c )
    swap  locals| delim |
    0 ?DO  delim split 2drop  LOOP
    delim split 2nip ;

: get-digit ( addr count -- digit )
    0> IF
         c@ dup  [char] 1  [ char 9 1+ ] literal  within  IF
           [char] 0 -
         ELSE
           drop -1
         THEN
       ELSE
         drop -1
       THEN ;

: count-digit ( addr count -- )
    get-digit dup 0> IF 
      cells digits +  1 swap +!
      1 sample-size +!
    ELSE
      drop
    THEN ;

: init-benford ( -- )
    digits 10 cells 0 fill
    0 sample-size ! ;

: import-field ( n fd -- )
    locals| input n |
    BEGIN pad 1024 input read-line throw WHILE
        pad swap [char] , n nth-field count-digit
    REPEAT drop ;

: rounded ( n -- n )
    10 /mod  swap 5 + 10 / + ;

: .percent  ( n -- )
    10000 sample-size @ */ rounded
    0 <# [char] % hold # [char] . hold #s #> type space ;

: report
    10 1 DO
        cr i 2 .r  ." : "  i cells digits + @ .percent
    LOOP ;

: benford  ( field -- )
    init-benford
    bl word count  r/o open-file throw
    2dup import-field
    close-file throw drop
    report ;

Execution:

3 benford lakestats-mn.csv
 1: 32.1%
 2: 19.8%
 3: 12.7%
 4: 9.1%
 5: 6.7%
 6: 6.1%
 7: 5.3%
 8: 4.5%
 9: 3.8%  ok

Leading Digits Of Powers Of 2 | Programming Praxis said
June 23, 2017 at 9:01 AM
[…] John D. Cook, a programmer who writes about mathematics (he would probably describe himself as a mathematician who writes about programming) recently wrote about the distribution of the leading digits of the powers of 2, observing that they follow Benford’s Law, which we studied in a previous exercise. […]

Programming Praxis

Benford’s Law

October 26, 2010

12 Responses to “Benford’s Law”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Benford’s Law

October 26, 2010

Share this:

Related

12 Responses to “Benford’s Law”

Leave a comment

Categories

Archives

Archives