## Entropy

### January 22, 2016

Here is our entropy calculator:

(define (entropy filename) (with-input-from-file filename (lambda () (let ((freqs (make-vector 256 0)) (ent 0.0)) (let loop ((c (read-char)) (len 0)) (if (not (eof-object? c)) (let ((i (char->integer c))) (vector-set! freqs i (+ (vector-ref freqs i) 1)) (loop (read-char) (+ len 1))) (do ((k 0 (+ k 1))) ((= k 256) (- ent)) (let ((freq (vector-ref freqs k))) (when (positive? freq) (let ((x (/ freq len))) (set! ent (+ ent (* x (log2 x))))))))))))))

The outer loop (on `let`

) reads the input, one character at a time, and counts each character in the *freqs* array. The inner loop (on `do`

), sums the entropy of each symbol that appeared in the input. Here’s an example, where the input file `entropy.ss`

contains the function shown above:

> (entropy "entropy.ss") 3.8011744425003404

You can run the program at http://ideone.com/GBPNhL.

Python has more entropy!

Other Scheme version over arbitrary sorted lists

(define (count-occurences l)

(define (count l last lastcount)

(cond

((null? l) ‘())

((equal? (car l) last) (count (cdr l) last (add1 lastcount)))

(else (cons lastcount (count (cdr l) (car l) 1)))))

(if (null? l) ‘() (count (cdr l) (car l) 1)))

(define (shannon-entropy l)

(let ((len (length l)))

(* -1

(fold

(lambda (occ last)

(let ((pi (/ occ len)))

(+ last (* pi

(/ (log pi) (log 2))))))

0

(count-occurences l)))))

(shannon-entropy (sort (string->list “Hello mine-turtle!”) char 3.29883055667735

(shannon-entropy ‘(red red blue black yellow purple))

;; -> 1.8208020839343

My chicken-scheme slightly modified version

(use vector-lib)

(define (log2 x) (/ (log x) (log 2)))

(define (filefreqs filename)

(with-input-from-file filename (lambda ()

(let ((freqs (make-vector 256 0)))

(let loop ((c (read-char))

(len 0))

(if (eof-object? c)

(vector-map (lambda (i x) (/ x len)) freqs)

(let ((i (char->integer c)))

(vector-set! freqs i (+ (vector-ref freqs i) 1))

(loop (read-char) (+ 1 len)))))))))

(define (entropy filename)

(- (vector-fold (lambda (i prev x)

(+ prev (* x (if (positive? x) (log2 x) 0))))

0

(filefreqs filename))))

(display (entropy "entropy.scm"))

Does formatting work?

Haskell, using the nice little foldl library:

Haskell has slightly less entropy than Python. Computed from my Python when the file contained just the source code, Jared’s Haskell, and Praxis’ Scheme:

Almost got by without using a variable: