Entropy
January 22, 2016
Here is our entropy calculator:
(define (entropy filename) (with-input-from-file filename (lambda () (let ((freqs (make-vector 256 0)) (ent 0.0)) (let loop ((c (read-char)) (len 0)) (if (not (eof-object? c)) (let ((i (char->integer c))) (vector-set! freqs i (+ (vector-ref freqs i) 1)) (loop (read-char) (+ len 1))) (do ((k 0 (+ k 1))) ((= k 256) (- ent)) (let ((freq (vector-ref freqs k))) (when (positive? freq) (let ((x (/ freq len))) (set! ent (+ ent (* x (log2 x))))))))))))))
The outer loop (on let
) reads the input, one character at a time, and counts each character in the freqs array. The inner loop (on do
), sums the entropy of each symbol that appeared in the input. Here’s an example, where the input file entropy.ss
contains the function shown above:
> (entropy "entropy.ss") 3.8011744425003404
You can run the program at http://ideone.com/GBPNhL.
Python has more entropy!
Other Scheme version over arbitrary sorted lists
(define (count-occurences l)
(define (count l last lastcount)
(cond
((null? l) ‘())
((equal? (car l) last) (count (cdr l) last (add1 lastcount)))
(else (cons lastcount (count (cdr l) (car l) 1)))))
(if (null? l) ‘() (count (cdr l) (car l) 1)))
(define (shannon-entropy l)
(let ((len (length l)))
(* -1
(fold
(lambda (occ last)
(let ((pi (/ occ len)))
(+ last (* pi
(/ (log pi) (log 2))))))
0
(count-occurences l)))))
(shannon-entropy (sort (string->list “Hello mine-turtle!”) char 3.29883055667735
(shannon-entropy ‘(red red blue black yellow purple))
;; -> 1.8208020839343
My chicken-scheme slightly modified version
(use vector-lib)
(define (log2 x) (/ (log x) (log 2)))
(define (filefreqs filename)
(with-input-from-file filename (lambda ()
(let ((freqs (make-vector 256 0)))
(let loop ((c (read-char))
(len 0))
(if (eof-object? c)
(vector-map (lambda (i x) (/ x len)) freqs)
(let ((i (char->integer c)))
(vector-set! freqs i (+ (vector-ref freqs i) 1))
(loop (read-char) (+ 1 len)))))))))
(define (entropy filename)
(- (vector-fold (lambda (i prev x)
(+ prev (* x (if (positive? x) (log2 x) 0))))
0
(filefreqs filename))))
(display (entropy "entropy.scm"))
Does formatting work?
Haskell, using the nice little foldl library:
Haskell has slightly less entropy than Python. Computed from my Python when the file contained just the source code, Jared’s Haskell, and Praxis’ Scheme:
Almost got by without using a variable: