Universal Hash Function
October 23, 2012
This exercise can be easy, impossible, or somewhere in between, depending on your computing environment. I had no trouble writing a universal hash function in Scheme, which has a limited number of types and predicates to recognize them. C gives you access to the internal bit-image of any object in the language, so it shouldn’t be hard to write a universal hash function there, either. But a straight-jacketed language like Java may make a universal hash function impossible. Here’s my version in Scheme:
(define (hash x)
(define (mod n) (modulo n 4294967296))
(cond ((boolean? x) (if x 1 0))
((symbol? x) (hash (symbol->string x)))
((char? x) (char->integer x))
((integer? x) (mod x))
((real? x)
(let* ((r (inexact->exact x))
(n (numerator r))
(d (denominator r)))
(mod (+ n (* 37 d)))))
((rational? x) (mod (+ (numerator x) (* 37 (denominator x)))))
((complex? x)
(mod (+ (hash (real-part x))
(* 37 (hash (imag-part x))))))
((null? x) 4294967295)
((pair? x)
(let loop ((x x) (s 0))
(if (null? x) s
(loop (cdr x) (mod (+ (* 31 s) (hash (car x))))))))
((vector? x)
(let loop ((i (- (vector-length x) 1)) (s 0))
(if (negative? i) s
(loop (- i 1) (mod (+ (* 31 s) (hash (vector-ref x i))))))))
((string? x)
(let loop ((i (- (string-length x) 1)) (s 0))
(if (negative? i) s
(loop (- i 1) (mod (+ (* 31 s) (hash (string-ref x i))))))))
((procedure? x) (error 'hash "can't hash procedure"))
((port? x) (error 'hash "can't hash port"))
(else (error 'hash "don't know how to hash object"))))
The basic types are handled first, then the collection types list and vector just loop over their members, handling each recursively. You can see examples at http://programmingpraxis.codepad.org/uBMG1PWc.
>>> ht = {“Test”: “String”, 1: “Integer”}
>>> ht[1]
‘Integer’
>>> ht[“Test”]
‘String’
>>>
Probably cheating in Python: use the
id()
function. From callinghelp(id)
in my Python environment:
id(…)
id(object) -> integer
Return the identity of an object. This is guaranteed to be unique among
simultaneously existing objects. (Hint: it’s the object’s memory address.)
In Python it is trivial, as you can use hash(object), which gives a hash for many data types. Hash refuses to give a value for mutable objects like a list, which makes sense.
Using Shen (which I’m definitely intrigued by):
Sorry, link above is wrong; here’s the correct link.
[…] Pages: 1 2 […]