Benford’s Law

October 26, 2010

Benford’s Law, which was discovered by Simon Newcomb in 1881 and rediscovered by Frank Benford in 1938, states that, for many sets of numbers that arise in a scale-invariant manner, the first digit is distributed logarithmically, with the first digit 1 about 30% of the time, decreasing digit by digit until the first digit is 9 about 5% of the time. Stated mathematically, the leading digit d ∈ {1 … b-1} for a number written in base b will occur with probability Pd = logb(1 + 1/d). Thus, in base 10, the probabilities of the first digit being the number 1, 2, … 9 are 30.1%, 17.6%, 12.5%, 9.7%, 7.9%, 6.7%, 5.8%, 5.1% and 4.6%.

Benford’s Law is counter-intuitive but arises frequently in nature. It is also frequently used in auditing. Make a list of the amounts of the checks that a bookkeeper has written in the past year; if more that 5% begin with the digit 8 or 9, the bookkeeper is likely an embezzler. More important, precinct results of the disputed Iranian elections a year ago displayed anomalous first-digit counts, suggesting vote fraud.

Recently, Shriram Krishnamurthy issued a programming challenge on the Racket mailing list, asking for smallest/tightest/cleanest/best code to calculate the first-digit percentages of a list of numbers; he also challenged readers to apply the function to data in a comma-separated values file. He didn’t give a source, but did mention that he was interested in the littoral area (in acres) of the lakes of Missesota; sample data appears on the next page.

Your first task is to write a function that calculates the first-digit percentages of a list of numbers. Your second task is to calculate the first-digit percentages of the data on the next page. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

About these ads

Pages: 1 2 3

11 Responses to “Benford’s Law”

  1. [...] today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more [...]

  2. My Haskell solution (see http://bonsaicode.wordpress.com/2010/10/26/programming-praxis-benford%E2%80%99s-law/ for a version with comments):

    import Data.List
    import Text.Printf
    
    firstDigits :: [Float] -> [(Char, Double)]
    firstDigits xs = map (\ds -> (head ds, 100 * toEnum (length ds) /
        toEnum (length xs))) . group . sort $ map (head . show) xs
    
    shriram :: [[String]] -> [(Char, Double)]
    shriram xs = firstDigits [n | [(n,_)] <- map (reads . (!! 3)) xs]
    
  3. Graham said

    I skipped straight to reading in values from a csv file and then performing the analysis. My solution in Python:

    #!/usr/bin/env python
    
    import csv
    
    def benford(csv_file, numerical_position):
        """
        Calculates the percentages of first digit occurrence in numerical_position
        in the csv_file in question.
        """
        r = csv.reader(open(csv_file))
        d = dict(zip(range(1,10), [0] * 9))
        for row in r:
            try:
                d[int(row[numerical_position][0])] += 1
            except:
                pass
        total = float(sum(d.values()))
        for k in d:
            print "%s:\t%.4s%%" % (k, 100 * d[k] / total)
        return
    
    if __name__ == '__main__':
        benford('lakes.csv', 3)
    

    When run on the given csv file (which I named lakes.csv), this yields:

    $ ./benford.py
    1: 32.0%
    2: 19.7%
    3: 12.7%
    4: 9.08%
    5: 6.67%
    6: 6.14%
    7: 5.25%
    8: 4.45%
    9: 3.82%

  4. Graham said

    Sorry about the messed up posting of my output!

  5. Gambiteer said

    Similar to Joe Marshall’s:

    (define (first-digit n base)
    ;; returns first base base digit of n
    (let ((base^2 (* base base)))
    (cond ((< n base) n)
    ((< n base^2) (quotient n base))
    (else
    (first-digit (first-digit n base^2) base)))))

  6. slabounty said

    A ruby version also going straight to CSV calculation …

    require 'csv'
    
    def benford(csv_file, position)
        first_digits = Array.new(10, 0)
        CSV.foreach(csv_file) do |row|
            digit = row[position].to_s[0]
            first_digits[digit.to_i] += 1  if digit =~ /[0-9]/
        end
        first_digits
    end
    
    first_digits = benford("lakes.csv", 3)
    total = first_digits.inject(0) { |sum, v| sum + v }
    first_digits.each_with_index { |v, i| puts "Percentage for #{i} #{ (v.to_f / total.to_f) * 100.0} " if i != 0 }
    
  7. Axio said

    ;; natural -> digit
    (define (head-of-num n)
      (let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

    ;; list of naturals -> digit
    (define (benford l)
      (let ((res (make-vector 10 0)) ;; store the results
            (count 0))
        (map (lambda (n) ;; increment each seen digit’s counter
               (let ((i (head-of-num n)))
                 (vector-set! res i (+ 1 (vector-ref res i))))
               (set! count (+ 1 count)))
             l)
        (map (lambda (i) ;; divide by number of elements
               (let ((val (/ (vector-ref res i) count)))
                 (vector-set! res i (list i val (exact->inexact val)))))
             (iota 0 9))
        res))

    (define (test data)
      (benford data))

  8. Axio said

    ;; I like this version more.
    (define (head-of-num n)
      (let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

    (define (benford l)
      (let* ((count 0)
             (res (fold-left
                    (lambda (n state)
                      (let ((i (head-of-num n)))
                        (vector-set! state i (+ 1 (vector-ref state i)))
                        (set! count (+ 1 count))
                        state))
                    (make-vector 10 0)
                    l)))
        (map (lambda (x) (exact->inexact (/ x count))) (vector->list res))))

    (define (test data)
      (benford data))

  9. Khanh Nguyen said

    My F# codes

    //partition list of single digits into bins
    let split l =
        let rec split_aux ll i =
            if (i = 9) then 
                [ll]
            else
                let tt = List.partition (fun x -> x = i) ll
                (fst tt) :: (split_aux (snd tt) (i+1))
        split_aux l 0
    
    let rec first_digit x = 
        if x < 10 then x else first_digit (x / 10)
    
    let benford l = 
        let first_digit_list = List.map first_digit l    
        let digits_counts = List.map List.length (split first_digit_list)
        //compute the percentage
        List.map (fun x -> (float x) / (float) (List.sum digits_counts)) digits_counts
    
    
    
  10. KNguyen said

    My F# implementation:

    //partition list of single digits into bins
    let split l =
        let rec split_aux ll i =
            if (i = 9) then 
                [ll]
            else
                let tt = List.partition (fun x -> x = i) ll
                (fst tt) :: (split_aux (snd tt) (i+1))
        split_aux l 0
    
    let rec first_digit x = 
        if x < 10 then x else first_digit (x / 10)
    
    let benford l = 
        let first_digit_list = List.map first_digit l    
        let digits_counts = List.map List.length (split first_digit_list)
        //compute the percentage
        List.map (fun x -> (float x) / (float) (List.sum digits_counts)) digits_counts
    
  11. David said

    In FORTH, However I got rid of the offending comma in the quotation for one of the lakes rather than parse csv totally correctly…

    { -------------------- split -------------------- }
    { taken from string library --------------  }
    
    : (adjust) ( adr1 c1 adr2 ard3 )
        2dup = IF  2drop  0:0 2swap     \ at end, set remainder to ""
             ELSE  nip 2 pick - >r
                   2dup r@ 1+ /string   \ adjust remainder
                   2swap drop r>        \ adjust result
             THEN ;
    
    : split  ( adr c delim -- adr1 c1 adr2 c2 )
        >r  \ save delimiter
        2dup +  2 pick  \ ( adr c -- adr c end-adr adr )
        BEGIN
            2dup > IF  dup c@ r@ <>  ELSE  false  THEN
        WHILE
            1+
        REPEAT r> drop
        (adjust) ;
    
    { --------------- Benford -------------- }
    
    variable sample-size
    create digits 10 cells allot
    
    : nth-field ( adr c delim n -- adr c )
        swap  locals| delim |
        0 ?DO  delim split 2drop  LOOP
        delim split 2nip ;
    
    : get-digit ( addr count -- digit )
        0> IF
             c@ dup  [char] 1  [ char 9 1+ ] literal  within  IF
               [char] 0 -
             ELSE
               drop -1
             THEN
           ELSE
             drop -1
           THEN ;
    
    : count-digit ( addr count -- )
        get-digit dup 0> IF 
          cells digits +  1 swap +!
          1 sample-size +!
        ELSE
          drop
        THEN ;
    
    : init-benford ( -- )
        digits 10 cells 0 fill
        0 sample-size ! ;
    
    : import-field ( n fd -- )
        locals| input n |
        BEGIN pad 1024 input read-line throw WHILE
            pad swap [char] , n nth-field count-digit
        REPEAT drop ;
    
    : rounded ( n -- n )
        10 /mod  swap 5 + 10 / + ;
    
    : .percent  ( n -- )
        10000 sample-size @ */ rounded
        0 <# [char] % hold # [char] . hold #s #> type space ;
    
    : report
        10 1 DO
            cr i 2 .r  ." : "  i cells digits + @ .percent
        LOOP ;
    
    : benford  ( field -- )
        init-benford
        bl word count  r/o open-file throw
        2dup import-field
        close-file throw drop
        report ;
    

    Execution:

    3 benford lakestats-mn.csv
     1: 32.1%
     2: 19.8%
     3: 12.7%
     4: 9.1%
     5: 6.7%
     6: 6.1%
     7: 5.3%
     8: 4.5%
     9: 3.8%  ok
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 611 other followers

%d bloggers like this: