## Pearson Hashing

### May 25, 2018

We begin by defining a permutation table, using functions from the Standard Prelude:

(define t (list->vector (shuffle (range 256))))

Then it is easy to compute the 8-bit Pearson hash of a string; we use addition mod 256 rather than xor:

(define (pearson8 str) (fold-left (lambda (n h) (vector-ref t (modulo (+ n h) 256))) 0 (map char->integer (string->list str))))

Here’s an example:

> (pearson8 "Programming Praxis") 213

And here is the 16-bit version of the algorithm:

(define (pearson16 str) (let* ((cs1 (map char->integer (string->list str))) (cs2 (cons (modulo (+ (car cs1) 1) 256) (cdr cs1)))) (let loop ((h1 0) (h2 0) (cs1 cs1) (cs2 cs2)) (if (null? cs1) (+ (* h1 256) h2) (let ((h1 (vector-ref t (modulo (+ (car cs1) h1) 256))) (h2 (vector-ref t (modulo (+ (car cs2) h2) 256)))) (loop h1 h2 (cdr cs1) (cdr cs2)))))))

> (pearson16 "Programming Praxis") 54602

You can run the program at https://ideone.com/hi4xWM.

Advertisements

Here is a solution in standard R7RS Scheme that uses the table from

the paper and provides 8 variants.

A curiosity: Using addition instead of xor produces collisions for the

sample strings in the 8-bit version. Replacing the original table

with a couple of randomly generated ones that I tried gets rid of the

collisions. I could also reproduce the collisions using

@programmingpraxis’s implementation (with the fixed table from the

paper).

Here’s a solution in C.

Example:

Solution in Ruby.

Outputs:

25

165

165121139220

Here are updated functions from my program above. I added casts to unsigned 8-bit ints for memory safety, to prevent reading outside array T’s boundary.

@V, in your example that sets size_in_bytes=4, the result, 165121139220, is represented with 38 bits (more than 32 bits = 4 bytes).

Here’s a modification that uses bit-shifting, instead of concatenating strings between “0” and “255”.

I also modified the function so that the first letter of the message is non-destructively incremented by the index of each iteration (a opposed to being incremented by 0, 1, 3, then 6, over four iterations).

[…] the previous exercise we studied a simple cryptographic hash. Today we will use that hash to write an equally-simple […]

Bit surprised we haven’t seen a Python one yet.