## Pearson Hashing

### May 25, 2018

Cryptographic hashes, also called message digests, are ubiquitous in modern cryptography — Brce Schneier calls them “the workhorses of modern cryptography” — used among other things in digital signatures, message authentication codes, as fingerprints to detect duplicate data, and as checksums to detect data corruption. Today we look at a simple example of a hash function, not suitable for cryptography, but useful for non-cryptographic purposes.

Peter Pearson describes a hash function based on a permutation table *T*of the numbers 0 inclusive to 256 exclusive. Starting from an initial hash of 0, each byte of the message is accessed in order, added to the current hash mod 256, then the hash is replaced by the value in the permutation table corresponding to that sum. In some implementations, including Pearson’s original, the two bytes are XOR’ed rather than added mod 256.

If eight bits isn’t enough, Pearson gives a simple algorithm for extending the algorithm to sixteen bits. First, compute the normal 8-bit hash. Then, add 1 (mod 256) to the first character of the string, and compute the normal 8-bit hash of the modified string; since a modification to a single bit provides a large change in the hash value, the two hashes will be independent of each other. Finally, concatenate the two hashes. Pearson doesn’t mention it, but this scheme can be extended to any multiple of eight bits.

Your task is to write a program that computes the 8-bit and 16-bit Pearson hashes of an input string, given a suitable permutation table. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Here is a solution in standard R7RS Scheme that uses the table from

the paper and provides 8 variants.

A curiosity: Using addition instead of xor produces collisions for the

sample strings in the 8-bit version. Replacing the original table

with a couple of randomly generated ones that I tried gets rid of the

collisions. I could also reproduce the collisions using

@programmingpraxis’s implementation (with the fixed table from the

paper).

Here’s a solution in C.

Example:

Solution in Ruby.

Outputs:

25

165

165121139220

Here are updated functions from my program above. I added casts to unsigned 8-bit ints for memory safety, to prevent reading outside array T’s boundary.

@V, in your example that sets size_in_bytes=4, the result, 165121139220, is represented with 38 bits (more than 32 bits = 4 bytes).

Here’s a modification that uses bit-shifting, instead of concatenating strings between “0” and “255”.

I also modified the function so that the first letter of the message is non-destructively incremented by the index of each iteration (a opposed to being incremented by 0, 1, 3, then 6, over four iterations).

[…] the previous exercise we studied a simple cryptographic hash. Today we will use that hash to write an equally-simple […]

Bit surprised we haven’t seen a Python one yet.