## Benford’s Law

### October 26, 2010

When Shriram first posted this exercise, my solution used uniq-c, compose and digits from the Standard Prelude, plus the non-standard but very useful `printf`

function of Chez Scheme:

`(define (benford xs)`

(for-each

(lambda (x) (printf "~a ~f~%" (car x) (/ (cdr x) (length xs))))

(uniq-c = (sort < (map (compose car digits) xs)))))

Shriram’s comment was “My word — that’s lovely.” Later in the thread, Joe Marshall presented a Lisp function to compute the first digit of a number, to which I say “My word — that’s lovely:”

`(defun leftmost-digit (n base)`

(if (> base n)

n

(do* ((i base next)

(next (* base base) (* i i)))

((> next n) (leftmost-digit (floor n i) base)))))

Here is my translation of that function to Scheme; making `base`

default to 10 makes the function a bit longer than the original:

`(define (first-digit n . base)`

(let* ((b (if (null? base) 10 (car base)))

(b2 (* b b)))

(let loop ((n n) (i b) (k b2))

(cond ((< n b) n)

((< n k)

(loop (quotient n i) b b2))

(else (loop n k (* i i)))))))

Shriram’s second challenge was to extract the first-digit proportions from a CSV file, such as the previous page. The text-file database functions of the two previous exercises make that simple. First, we write a predicate to keep useful records and discard the various headers, “N/A” values, and other useless records:

`(define (keep? xs) (string->number (list-ref xs 3)))`

Then we use `read-csv-record`

, `filter-port`

and `map-reduce-port`

to calculate the first-digit percentages:

`> (let* ((fds (with-input-from-file "mn-lakes.csv"`

(lambda ()

(map-reduce-port

(filter-port read-csv-record keep?)

(lambda (x)

(values

(first-digit (floor (string->number (list-ref x 3))))

1))

(lambda (k v1 v2) (+ v1 v2))

<))))

(fds-count (apply + (map cdr fds))))

(for-each (lambda (x)

(printf "~a ~f~%"

(car x)

(/ (cdr x) fds-count 0.01)))

fds))

1 32.05699020480855

2 19.768477292965272

3 12.733748886910062

4 9.082813891362422

5 6.678539626001781

6 6.144256455921639

7 5.253784505788068

8 4.452359750667854

9 3.8290293855743545

During testing, I wrote the following awk program to compare my results:

`awk -F, '`

$4>0 { c[substr($4,1,1)]++; t++ }

END { for (i=1; i<10; i++) print i, c[i]/t*100 }

' mn-lakes.csv

Running that program produced results that differed from the Scheme program; one record for which the Scheme program calculated a first digit of 1 was calculated as a first digit of 4 by the awk program. The problem was the record

`"Gull, Upper","Nisswa",422,154,54,7.5`

The simplistic awk program saw “Gull, Upper” as two fields when it is, in fact, a single field, proving again that it is hard to parse CSV files.

You can run the program at http://programmingpraxis.codepad.org/7l3dRSAn.

[...] today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more [...]

My Haskell solution (see http://bonsaicode.wordpress.com/2010/10/26/programming-praxis-benford%E2%80%99s-law/ for a version with comments):

I skipped straight to reading in values from a csv file and then performing the analysis. My solution in Python:

When run on the given csv file (which I named lakes.csv), this yields:

$ ./benford.py

1: 32.0%

2: 19.7%

3: 12.7%

4: 9.08%

5: 6.67%

6: 6.14%

7: 5.25%

8: 4.45%

9: 3.82%

Sorry about the messed up posting of my output!

Similar to Joe Marshall’s:

(define (first-digit n base)

;; returns first base base digit of n

(let ((base^2 (* base base)))

(cond ((< n base) n)

((< n base^2) (quotient n base))

(else

(first-digit (first-digit n base^2) base)))))

A ruby version also going straight to CSV calculation …

;; natural -> digit

(define (head-of-num n)

(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

;; list of naturals -> digit

(define (benford l)

(let ((res (make-vector 10 0)) ;; store the results

(count 0))

(map (lambda (n) ;; increment each seen digit’s counter

(let ((i (head-of-num n)))

(vector-set! res i (+ 1 (vector-ref res i))))

(set! count (+ 1 count)))

l)

(map (lambda (i) ;; divide by number of elements

(let ((val (/ (vector-ref res i) count)))

(vector-set! res i (list i val (exact->inexact val)))))

(iota 0 9))

res))

(define (test data)

(benford data))

;; I like this version more.

(define (head-of-num n)

(let loop ((n n)) (if (< n 10) n (loop (quotient n 10)))))

(define (benford l)

(let* ((count 0)

(res (fold-left

(lambda (n state)

(let ((i (head-of-num n)))

(vector-set! state i (+ 1 (vector-ref state i)))

(set! count (+ 1 count))

state))

(make-vector 10 0)

l)))

(map (lambda (x) (exact->inexact (/ x count))) (vector->list res))))

(define (test data)

(benford data))

My F# codes

My F# implementation:

In FORTH, However I got rid of the offending comma in the quotation for one of the lakes rather than parse csv totally correctly…

Execution: