Pearson Hashing
May 25, 2018
We begin by defining a permutation table, using functions from the Standard Prelude:
(define t (list->vector (shuffle (range 256))))
Then it is easy to compute the 8-bit Pearson hash of a string; we use addition mod 256 rather than xor:
(define (pearson8 str) (fold-left (lambda (n h) (vector-ref t (modulo (+ n h) 256))) 0 (map char->integer (string->list str))))
Here’s an example:
> (pearson8 "Programming Praxis") 213
And here is the 16-bit version of the algorithm:
(define (pearson16 str) (let* ((cs1 (map char->integer (string->list str))) (cs2 (cons (modulo (+ (car cs1) 1) 256) (cdr cs1)))) (let loop ((h1 0) (h2 0) (cs1 cs1) (cs2 cs2)) (if (null? cs1) (+ (* h1 256) h2) (let ((h1 (vector-ref t (modulo (+ (car cs1) h1) 256))) (h2 (vector-ref t (modulo (+ (car cs2) h2) 256)))) (loop h1 h2 (cdr cs1) (cdr cs2)))))))
> (pearson16 "Programming Praxis") 54602
You can run the program at https://ideone.com/hi4xWM.
Here is a solution in standard R7RS Scheme that uses the table from
the paper and provides 8 variants.
A curiosity: Using addition instead of xor produces collisions for the
sample strings in the 8-bit version. Replacing the original table
with a couple of randomly generated ones that I tried gets rid of the
collisions. I could also reproduce the collisions using
@programmingpraxis’s implementation (with the fixed table from the
paper).
Here’s a solution in C.
Example:
Solution in Ruby.
Outputs:
25
165
165121139220
Here are updated functions from my program above. I added casts to unsigned 8-bit ints for memory safety, to prevent reading outside array T’s boundary.
@V, in your example that sets size_in_bytes=4, the result, 165121139220, is represented with 38 bits (more than 32 bits = 4 bytes).
Here’s a modification that uses bit-shifting, instead of concatenating strings between “0” and “255”.
I also modified the function so that the first letter of the message is non-destructively incremented by the index of each iteration (a opposed to being incremented by 0, 1, 3, then 6, over four iterations).
[…] the previous exercise we studied a simple cryptographic hash. Today we will use that hash to write an equally-simple […]
Bit surprised we haven’t seen a Python one yet.