Benford’s Law

October 26, 2010

When Shriram first posted this exercise, my solution used uniq-c, compose and digits from the Standard Prelude, plus the non-standard but very useful printf function of Chez Scheme:

(define (benford xs) (for-each (lambda (x) (printf "~a ~f~%" (car x) (/ (cdr x) (length xs)))) (uniq-c = (sort < (map (compose car digits) xs)))))

Shriram’s comment was “My word — that’s lovely.” Later in the thread, Joe Marshall presented a Lisp function to compute the first digit of a number, to which I say “My word — that’s lovely:”

(defun leftmost-digit (n base) (if (> base n) n (do* ((i base next) (next (* base base) (* i i))) ((> next n) (leftmost-digit (floor n i) base)))))

Here is my translation of that function to Scheme; making base default to 10 makes the function a bit longer than the original:

(define (first-digit n . base) (let* ((b (if (null? base) 10 (car base))) (b2 (* b b))) (let loop ((n n) (i b) (k b2)) (cond ((< n b) n) ((< n k) (loop (quotient n i) b b2)) (else (loop n k (* i i)))))))

Shriram’s second challenge was to extract the first-digit proportions from a CSV file, such as the previous page. The text-file database functions of the two previous exercises make that simple. First, we write a predicate to keep useful records and discard the various headers, “N/A” values, and other useless records:

(define (keep? xs) (string->number (list-ref xs 3)))

Then we use read-csv-record, filter-port and map-reduce-port to calculate the first-digit percentages:

> (let* ((fds (with-input-from-file "mn-lakes.csv" (lambda () (map-reduce-port (filter-port read-csv-record keep?) (lambda (x) (values (first-digit (floor (string->number (list-ref x 3)))) 1)) (lambda (k v1 v2) (+ v1 v2)) <)))) (fds-count (apply + (map cdr fds)))) (for-each (lambda (x) (printf "~a ~f~%" (car x) (/ (cdr x) fds-count 0.01))) fds)) 1 32.05699020480855 2 19.768477292965272 3 12.733748886910062 4 9.082813891362422 5 6.678539626001781 6 6.144256455921639 7 5.253784505788068 8 4.452359750667854 9 3.8290293855743545

During testing, I wrote the following awk program to compare my results:

awk -F, ' $4>0 { c[substr($4,1,1)]++; t++ } END { for (i=1; i<10; i++) print i, c[i]/t*100 } ' mn-lakes.csv

Running that program produced results that differed from the Scheme program; one record for which the Scheme program calculated a first digit of 1 was calculated as a first digit of 4 by the awk program. The problem was the record

"Gull, Upper","Nisswa",422,154,54,7.5

The simplistic awk program saw “Gull, Upper” as two fields when it is, in fact, a single field, proving again that it is hard to parse CSV files.

You can run the program at http://programmingpraxis.codepad.org/7l3dRSAn.

Posted by programmingpraxis

Filed in Exercises

12 Comments »

12 Responses to “Benford’s Law”

Programming Praxis – Benford’s Law « Bonsai Code said
October 26, 2010 at 12:14 PM
[…] today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more […]

Remco Niemeijer said

October 26, 2010 at 12:14 PM

My Haskell solution (see http://bonsaicode.wordpress.com/2010/10/26/programming-praxis-benford%E2%80%99s-law/ for a version with comments):

import Data.List
import Text.Printf

firstDigits :: [Float] -> [(Char, Double)]
firstDigits xs = map (\ds -> (head ds, 100 * toEnum (length ds) /
    toEnum (length xs))) . group . sort $ map (head . show) xs

shriram :: [[String]] -> [(Char, Double)]
shriram xs = firstDigits [n | [(n,_)] <- map (reads . (!! 3)) xs]

Graham said

October 26, 2010 at 12:32 PM

I skipped straight to reading in values from a csv file and then performing the analysis. My solution in Python:

#!/usr/bin/env python

import csv

def benford(csv_file, numerical_position):
    """
    Calculates the percentages of first digit occurrence in numerical_position
    in the csv_file in question.
    """
    r = csv.reader(open(csv_file))
    d = dict(zip(range(1,10), [0] * 9))
    for row in r:
        try:
            d[int(row[numerical_position][0])] += 1
        except:
            pass
    total = float(sum(d.values()))
    for k in d:
        print "%s:\t%.4s%%" % (k, 100 * d[k] / total)
    return

if __name__ == '__main__':
    benford('lakes.csv', 3)

When run on the given csv file (which I named lakes.csv), this yields:

$ ./benford.py
1: 32.0%
2: 19.7%
3: 12.7%
4: 9.08%
5: 6.67%
6: 6.14%
7: 5.25%
8: 4.45%
9: 3.82%

Graham said
October 26, 2010 at 12:34 PM
Sorry about the messed up posting of my output!
Gambiteer said
October 26, 2010 at 3:22 PM
Similar to Joe Marshall’s:

(define (first-digit n base)
;; returns first base base digit of n
(let ((base^2 (* base base)))
(cond ((< n base) n)
((< n base^2) (quotient n base))
(else
(first-digit (first-digit n base^2) base)))))

slabounty said

October 26, 2010 at 7:49 PM

A ruby version also going straight to CSV calculation …

require 'csv'

def benford(csv_file, position)
    first_digits = Array.new(10, 0)
    CSV.foreach(csv_file) do |row|
        digit = row[position].to_s[0]
        first_digits[digit.to_i] += 1  if digit =~ /[0-9]/
    end
    first_digits
end

first_digits = benford("lakes.csv", 3)
total = first_digits.inject(0) { |sum, v| sum + v }
first_digits.each_with_index { |v, i| puts "Percentage for #{i} #{ (v.to_f / total.to_f) * 100.0} " if i != 0 }

Axio said
October 27, 2010 at 2:29 AM
;; natural -> digit
(define (head-of-num n)
  (let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

;; list of naturals -> digit
(define (benford l)
  (let ((res (make-vector 10 0)) ;; store the results
        (count 0))
    (map (lambda (n) ;; increment each seen digit’s counter
           (let ((i (head-of-num n)))
             (vector-set! res i (+ 1 (vector-ref res i))))
           (set! count (+ 1 count)))
         l)
    (map (lambda (i) ;; divide by number of elements
           (let ((val (/ (vector-ref res i) count)))
             (vector-set! res i (list i val (exact->inexact val)))))
         (iota 0 9))
    res))

(define (test data)
  (benford data))
Axio said
October 28, 2010 at 5:52 AM
;; I like this version more.
(define (head-of-num n)
  (let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

(define (benford l)
  (let* ((count 0)
         (res (fold-left
                (lambda (n state)
                  (let ((i (head-of-num n)))
                    (vector-set! state i (+ 1 (vector-ref state i)))
                    (set! count (+ 1 count))
                    state))
                (make-vector 10 0)
                l)))
    (map (lambda (x) (exact->inexact (/ x count))) (vector->list res))))

(define (test data)
  (benford data))

Khanh Nguyen said

October 28, 2010 at 8:44 PM

My F# codes

//partition list of single digits into bins
let split l =
    let rec split_aux ll i =
        if (i = 9) then 
            [ll]
        else
            let tt = List.partition (fun x -> x = i) ll
            (fst tt) :: (split_aux (snd tt) (i+1))
    split_aux l 0

let rec first_digit x = 
    if x < 10 then x else first_digit (x / 10)

let benford l = 
    let first_digit_list = List.map first_digit l    
    let digits_counts = List.map List.length (split first_digit_list)
    //compute the percentage
    List.map (fun x -> (float x) / (float) (List.sum digits_counts)) digits_counts

KNguyen said

October 28, 2010 at 8:46 PM

My F# implementation:

//partition list of single digits into bins
let split l =
    let rec split_aux ll i =
        if (i = 9) then 
            [ll]
        else
            let tt = List.partition (fun x -> x = i) ll
            (fst tt) :: (split_aux (snd tt) (i+1))
    split_aux l 0

let rec first_digit x = 
    if x < 10 then x else first_digit (x / 10)

let benford l = 
    let first_digit_list = List.map first_digit l    
    let digits_counts = List.map List.length (split first_digit_list)
    //compute the percentage
    List.map (fun x -> (float x) / (float) (List.sum digits_counts)) digits_counts

David said

May 19, 2012 at 6:13 PM

In FORTH, However I got rid of the offending comma in the quotation for one of the lakes rather than parse csv totally correctly…

{ -------------------- split -------------------- }
{ taken from string library --------------  }

: (adjust) ( adr1 c1 adr2 ard3 )
    2dup = IF  2drop  0:0 2swap     \ at end, set remainder to ""
         ELSE  nip 2 pick - >r
               2dup r@ 1+ /string   \ adjust remainder
               2swap drop r>        \ adjust result
         THEN ;

: split  ( adr c delim -- adr1 c1 adr2 c2 )
    >r  \ save delimiter
    2dup +  2 pick  \ ( adr c -- adr c end-adr adr )
    BEGIN
        2dup > IF  dup c@ r@ <>  ELSE  false  THEN
    WHILE
        1+
    REPEAT r> drop
    (adjust) ;

{ --------------- Benford -------------- }

variable sample-size
create digits 10 cells allot

: nth-field ( adr c delim n -- adr c )
    swap  locals| delim |
    0 ?DO  delim split 2drop  LOOP
    delim split 2nip ;

: get-digit ( addr count -- digit )
    0> IF
         c@ dup  [char] 1  [ char 9 1+ ] literal  within  IF
           [char] 0 -
         ELSE
           drop -1
         THEN
       ELSE
         drop -1
       THEN ;

: count-digit ( addr count -- )
    get-digit dup 0> IF 
      cells digits +  1 swap +!
      1 sample-size +!
    ELSE
      drop
    THEN ;

: init-benford ( -- )
    digits 10 cells 0 fill
    0 sample-size ! ;

: import-field ( n fd -- )
    locals| input n |
    BEGIN pad 1024 input read-line throw WHILE
        pad swap [char] , n nth-field count-digit
    REPEAT drop ;

: rounded ( n -- n )
    10 /mod  swap 5 + 10 / + ;

: .percent  ( n -- )
    10000 sample-size @ */ rounded
    0 <# [char] % hold # [char] . hold #s #> type space ;

: report
    10 1 DO
        cr i 2 .r  ." : "  i cells digits + @ .percent
    LOOP ;

: benford  ( field -- )
    init-benford
    bl word count  r/o open-file throw
    2dup import-field
    close-file throw drop
    report ;

Execution:

3 benford lakestats-mn.csv
 1: 32.1%
 2: 19.8%
 3: 12.7%
 4: 9.1%
 5: 6.7%
 6: 6.1%
 7: 5.3%
 8: 4.5%
 9: 3.8%  ok

Leading Digits Of Powers Of 2 | Programming Praxis said
June 23, 2017 at 9:01 AM
[…] John D. Cook, a programmer who writes about mathematics (he would probably describe himself as a mathematician who writes about programming) recently wrote about the distribution of the leading digits of the powers of 2, observing that they follow Benford’s Law, which we studied in a previous exercise. […]

Programming Praxis

Benford’s Law

October 26, 2010

12 Responses to “Benford’s Law”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Benford’s Law

October 26, 2010

Share this:

Related

12 Responses to “Benford’s Law”

Leave a comment

Categories

Archives

Archives