## Universal Hash Function

### October 23, 2012

This exercise can be easy, impossible, or somewhere in between, depending on your computing environment. I had no trouble writing a universal hash function in Scheme, which has a limited number of types and predicates to recognize them. C gives you access to the internal bit-image of any object in the language, so it shouldn’t be hard to write a universal hash function there, either. But a straight-jacketed language like Java may make a universal hash function impossible. Here’s my version in Scheme:

`(define (hash x)`

(define (mod n) (modulo n 4294967296))

(cond ((boolean? x) (if x 1 0))

((symbol? x) (hash (symbol->string x)))

((char? x) (char->integer x))

((integer? x) (mod x))

((real? x)

(let* ((r (inexact->exact x))

(n (numerator r))

(d (denominator r)))

(mod (+ n (* 37 d)))))

((rational? x) (mod (+ (numerator x) (* 37 (denominator x)))))

((complex? x)

(mod (+ (hash (real-part x))

(* 37 (hash (imag-part x))))))

((null? x) 4294967295)

((pair? x)

(let loop ((x x) (s 0))

(if (null? x) s

(loop (cdr x) (mod (+ (* 31 s) (hash (car x))))))))

((vector? x)

(let loop ((i (- (vector-length x) 1)) (s 0))

(if (negative? i) s

(loop (- i 1) (mod (+ (* 31 s) (hash (vector-ref x i))))))))

((string? x)

(let loop ((i (- (string-length x) 1)) (s 0))

(if (negative? i) s

(loop (- i 1) (mod (+ (* 31 s) (hash (string-ref x i))))))))

((procedure? x) (error 'hash "can't hash procedure"))

((port? x) (error 'hash "can't hash port"))

(else (error 'hash "don't know how to hash object"))))

The basic types are handled first, then the collection types list and vector just loop over their members, handling each recursively. You can see examples at http://programmingpraxis.codepad.org/uBMG1PWc.

Pages: 1 2

>>> ht = {“Test”: “String”, 1: “Integer”}

>>> ht[1]

‘Integer’

>>> ht["Test"]

‘String’

>>>

Probably cheating in Python: use the

`id()`

function. From calling`help(id)`

in my Python environment:

id(…)

id(object) -> integer

Return the identity of an object. This is guaranteed to be unique among

simultaneously existing objects. (Hint: it’s the object’s memory address.)

In Python it is trivial, as you can use hash(object), which gives a hash for many data types. Hash refuses to give a value for mutable objects like a list, which makes sense.

Using Shen (which I’m definitely intrigued by):

Sorry, link above is wrong; here’s the correct link.

[...] Pages: 1 2 [...]