Minimum Hamming Distance
November 12, 2013
To calculate the Hamming distance between two numbers, take the popcount of the xor of them:
(define (hamming xs) (pop-count (apply logxor xs)))
Then we make all possible pairings:
(define (pairings xs)
(let loop1 ((xs xs) (zs (list)))
(if (null? xs) (reverse zs)
(let loop2 ((ys (cdr xs)) (zs zs))
(if (null? ys) (loop1 (cdr xs) zs)
(loop2 (cdr ys) (cons (list (car xs) (car ys)) zs)))))))
Finally we compute the Hamming distances and take the minimum:
(define (min-hamm xs) (apply min (map hamming (pairings xs))))
Here are some examples:
> (hamming '(79 83))
3
> (min-hamm '(13500 1935 9645 5790))
4
If your Scheme doesn’t provide bit operators, there are implementations in the Standard Prelude. You can run the program at http://programmingpraxis.codepad.org/UNc8FOeQ.
Technically speaking, isn’t it always *after* midnight (and thus, never a suitable time to feed your Gremlin?)
In Python. The function min_hamming returns the pair with the smallest Hamming distance, as requested, and not the minimum Hamming distance.
def pop_count(n): i = 0 while n: i += 1 n &= n - 1 return i def hamming(m, n): return pop_count(m ^ n) def min_hamming(arr): minh = (float("inf"), None) while arr: n = arr.pop() minh = min([minh] + [(hamming(n, m), (n, m)) for m in arr]) return minh[1] print hamming(79,83) # -> 3 print min_hamming([13500, 1935, 9645, 5790]) # -> (5790, 1935) print hamming(5790, 1935) # -> 4Haskell solution
Past 6 pm, my zone. Well before a midnight, my zone. Python 3. The bit counter ignores not only zeroes but also a prefix “0b” in the binary string representation of the xor. (I also considered negative numbers but decided to ignore them. The sign is ignored, but I suspect the result might not be a Hamming distance if negative numbers were involved.)
from operator import xor
from itertools import combinations
def ham(p): return sum(1 for d in bin(xor(*p)) if d == '1')
def minham(ms): return min(combinations(ms, 2), key=ham)
The test program is longer than the program. I added the model solution example numbers to my own test numbers afterwards.
if __name__ == '__main__':
for numbers in (31, 4, 15, 926), (13500, 1935, 9645, 5790):
pairs = tuple(combinations(numbers, 2))
print('The Hammings of some pairs:')
print(*zip(map(ham, pairs), pairs), sep = '\n')
print('With "the" pair of fewest different bits:', minham(numbers))
else:
print('And the Hamming of 79 and 83:', ham((79, 83)))
Originally I had an assertion that the minimun distance pair was unique. I removed the assertion. My test shows that the model pairs are much tied.
The Hammings of some pairs:
(4, (31, 4))
(1, (31, 15))
(4, (31, 926))
(3, (4, 15))
(6, (4, 926))
(5, (15, 926))
With "the" pair of fewest different bits: (31, 15)
The Hammings of some pairs:
(8, (13500, 1935))
(4, (13500, 9645))
(4, (13500, 5790))
(4, (1935, 9645))
(4, (1935, 5790))
(8, (9645, 5790))
With "the" pair of fewest different bits: (13500, 9645)
And the Hamming of 79 and 83: 3
None of us seems interested in the part of the assignment that says to “read” the list of numbers.
Scheme needs the arg min and arg max that some machine learning communities or some such use so much. Something like the following.
(define (argmin proc args)
(let loop ((least (proc (car args)))
(arg (car args)) (args (cdr args)))
(if (null? args) arg
(let ((weight (proc (car args))))
(if (< weight least)
(loop weight (car args) (cdr args))
(loop least arg (cdr args)))))))
To be used where Python would have the key for min, as I did above.
(let ((numbers '(3 -1 4 1 -5 9)))
(display "Numbers: ") (write numbers) (newline)
(display "Absolutely least: ") (write (argmin abs numbers)) (newline))
(let ((strings '("three" "one" "four" "one" "five" "nine")))
(display "Strings: ") (write strings) (newline)
(display "Shortest: ") (write (argmin string-length strings)) (newline))
That writes the following.
Numbers: (3 -1 4 1 -5 9)
Absolutely least: -1
Strings: ("three" "one" "four" "one" "five" "nine")
Shortest: "one"
The present problem model solution would be modified like so:
(define (min-hamm xs) (argmin hamming (pairings xs)))
The example would then show the minimum distance pair instead of the minimum distance.
> (min-hamm '(13500 1935 9645 5790))
(13500 9645)
FORTH version:
-1 1 rshift constant MAXINT : bitcount ( n -- n ) dup $5555555555555555 and swap $AAAAAAAAAAAAAAAA and 1 rshift + dup $3333333333333333 and swap $CCCCCCCCCCCCCCCC and 2 rshift + dup $0F0F0F0F0F0F0F0F and swap $F0F0F0F0F0F0F0F0 and 4 rshift + dup $00FF00FF00FF00FF and swap $FF00FF00FF00FF00 and 8 rshift + dup $0000FFFF0000FFFF and swap $FFFF0000FFFF0000 and 16 rshift + dup $00000000FFFFFFFF and swap $FFFFFFFF00000000 and 32 rshift + ; : hamming ( n1 n2 -- n ) xor bitcount ; : minpair ( adr u -- a b ) MAXINT dup dup locals| min-h min-a min-b count array | count 0 DO count i 1+ ?DO array i cells + @ array j cells + @ 2dup hamming dup min-h < IF to min-h to min-a to min-b ELSE 3drop THEN LOOP LOOP min-a min-b ; create test 13500 , 1935 , 9645 , 5790 , test 4 minpair . . cr 9645 13500In Smalltalk:
Each number is transformed into its equivalent bit string and paired with each other without repetition:
SequenceableCollection >> unorderedPairs |n| n := self size. n < 2 ifTrue: [ self error ]. ^ (1 to: n - 1) flatCollect: [ :i | (i + 1 to: n) collect: [ :j | { self at: i . self at: j } ] ]The resulting collection of pairs is minimized with respect to the hamming function:
Here’s an implementation in Python. The problem statement seems a
bit ambiguous to me; it doesn’t specify that the pairs should be of
two distinct numbers. Hence the snarky solution is to return zero,
the distance from the first number in your list to itself.
#!/usr/bin/env python3 from itertools import combinations, starmap def snarky(ns): """The minimum distance between arbitrary pairs is zero (there was no stipulation that the pairs must be of distinct items; dist(ns[0], ns[0]) = 0). """ return 0 def hamming_distance(m, n): """Hamming Distance between two natural numbers is the number of ones in the binary representation of their xor. """ return sum(map(int, bin(m ^ n)[2:])) def min_hamming_dist(ns): """Minimum Hamming Distant between pairs of two different ns""" return min(starmap(hamming_distance, combinations(ns, 2))) if __name__ == '__main__': print(hamming_distance(79, 83)) ns = [13500, 1935, 9645, 5790] print(snarky(ns)) print(min_hamming_dist(ns))Re snarky, I think it’s reasonably clear that the numbers need to be distinct, because they need to be two: “The assignment was to read a list of numbers and find the two with the minimum Hamming distance between them.”
(And, AFAICS, there’s still no candidate solution that reads the numbers from somewhere. We are all just putting test numbers in the source code. We are all failing that part of the assignment. And I agree with us :)
The problem is very interesting and remembers me of a similar one that asks to find the closest pair of 2D points, that can be solved in O(n*log(n)). Despite the fact that in both cases we have metrics, Hamming and euclidean, in this current case we have a possibly 32-dimensional space with a different structure, if we assume that all the numbers are 32-bit integers. Probably there are more subtle differences, but I’m still wondering if this problem can be solved better than Theta(n^2). An interesting idea that crossed my mind was to use BK-trees, which is a very nice data structure which I think fits perfectly here, but still it is hard to evaluate the complexity of the queries.
@cosmin: While 32-bit, 2’s-complement integers are a possible space, there’s nothing in the problem statement to indicate that it was intended. Taking that to be implicit is a bit like assuming that all text is in ASCII unless otherwise stated. In both cases, an explicit assumption is fine (and in this case needed if you want to include negative numbers).
The complexity question should be interesting.
I’m thinking there could be a radixsort-inspired approach to speeding things up. Suppose n numbers are represented with k bits each. Assume n>=2 throughout.
Base step: If k=1, then the minimum Hamming distance (mindist) and a minimal pair (minpair) can be found in a trivial amount of time (If n>=3, then mindist is 0, and minpair can be found in at most two comparisons. If n=2, then minpair takes no more work and mindist takes one comparison).
Recursive step: Spray the numbers into 2k bins, with all numbers whose i’th bit is 0 going into bin (i,0) and those whose i’th bit is 1 going into bin (i,1), the i’th bit stripped out with a couple bitwise operations. This spraying operation is clearly O(nk). Check if all bins contain at most one element (O(k)). If so, choose the contents of any two as a minimal pair and report their Hamming distance. If not, discard all bins with less than two elements. Use the algorithm recursively to find the minimal pair and distance in each bin. Then choose the minimal one from among these, which is O(k).
The most expensive operation in the recursive step is the spraying itself. This takes O(nk) at the top level. At the jth level we have (at most) something vaguely like 2^j k (k-1) … (k-i) bins, into which we’re sorting something vaguely like k(k-1)…(k-i)n elements. The total should be something vaguely like O(2^k (k!)^2 n), I believe, but I haven’t done a careful calculation, so I could be *completely* wrong. Also, if my estimate is an upper bound, there may be a modification that will give a slightly better one: when n gets sufficiently large compared to k, it eventually gets *easier* to find the minimum Hamming distance, and may or may not get easier to find a minimum pair. Note that (to take an extreme example) if n>2^k then the Hamming distance *must* be 0.
Forget what I wrote above—it’s awful.
A recent paper on this very topic: http://www.singacom.uva.es/~edgar/cactc2012/trabajos/CACT2012_Villa.pdf
[…] Problem statement from Programming Praxis: […]