Minimum Hamming Distance

November 12, 2013

Every autumn, a month or two after the start of a new academic year, I receive a handful of emails from students who expect me to help them with their homework. I never respond. Some of the requests are polite — “Dear Kind Sir;” most of them are demanding — “It has to be in C# and cannot exceed 200 lines;” all of them have spelling and grammar mistakes. Today’s exercise is from an email that demanded a response by midnight (the writer didn’t say what time zone), but that was a few weeks ago, and the homework deadline is well past, and at least the question is entertaining.

The Hamming distance between two numbers is the number of bits that must be changed to make the bit representations of the two numbers equal. For instance, the number 79 is represented as 1001111 in binary, the number 83 is represented as 1010011 in binary, and the Hamming distance between them is 3, as the second, third and fourth bits (counting from the right, starting with zero) differ. The assignment was to read a list of numbers and find the two with the minimum Hamming distance between them.

Your task is to write somebody’s homework assignment, as described above — you must finish it before midnight! When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

15 Comments »

15 Responses to “Minimum Hamming Distance”

mvaneerde said
November 12, 2013 at 1:20 PM
Technically speaking, isn’t it always *after* midnight (and thus, never a suitable time to feed your Gremlin?)

Paul said

November 12, 2013 at 2:15 PM

In Python. The function min_hamming returns the pair with the smallest Hamming distance, as requested, and not the minimum Hamming distance.

def pop_count(n):
    i = 0
    while n:
        i += 1
        n &= n - 1
    return i

def hamming(m, n):
    return pop_count(m ^ n)

def min_hamming(arr):
    minh = (float("inf"), None)
    while arr:
        n = arr.pop()
        minh = min([minh] + [(hamming(n, m), (n, m)) for m in arr]) 
    return minh[1]
    
print hamming(79,83)                                             # -> 3
print min_hamming([13500, 1935, 9645, 5790])    # -> (5790, 1935)
print hamming(5790, 1935)                                    # -> 4

svenningsson said

November 12, 2013 at 2:40 PM

Haskell solution

import Data.Bits
import Data.List
import Data.Function

type Distance = Int

minimumHamming :: Bits a => [a] -> (a,a,Distance)
minimumHamming xs = minimumBy (compare `on` third) 
                    [ (x,y,dist x y) | x:ys <- tails xs, y <- ys ]
  where third (_,_,c) = c

dist :: Bits a => a -> a -> Distance
dist x y = popCount (x `xor` y)

Jussi Piitulainen said
November 12, 2013 at 4:42 PM
Past 6 pm, my zone. Well before a midnight, my zone. Python 3. The bit counter ignores not only zeroes but also a prefix “0b” in the binary string representation of the xor. (I also considered negative numbers but decided to ignore them. The sign is ignored, but I suspect the result might not be a Hamming distance if negative numbers were involved.)
from operator import xor from itertools import combinations def ham(p): return sum(1 for d in bin(xor(*p)) if d == '1') def minham(ms): return min(combinations(ms, 2), key=ham)
The test program is longer than the program. I added the model solution example numbers to my own test numbers afterwards.
if __name__ == '__main__': for numbers in (31, 4, 15, 926), (13500, 1935, 9645, 5790): pairs = tuple(combinations(numbers, 2)) print('The Hammings of some pairs:') print(*zip(map(ham, pairs), pairs), sep = '\n') print('With "the" pair of fewest different bits:', minham(numbers)) else: print('And the Hamming of 79 and 83:', ham((79, 83)))
Originally I had an assertion that the minimun distance pair was unique. I removed the assertion. My test shows that the model pairs are much tied.
The Hammings of some pairs: (4, (31, 4)) (1, (31, 15)) (4, (31, 926)) (3, (4, 15)) (6, (4, 926)) (5, (15, 926)) With "the" pair of fewest different bits: (31, 15) The Hammings of some pairs: (8, (13500, 1935)) (4, (13500, 9645)) (4, (13500, 5790)) (4, (1935, 9645)) (4, (1935, 5790)) (8, (9645, 5790)) With "the" pair of fewest different bits: (13500, 9645) And the Hamming of 79 and 83: 3
None of us seems interested in the part of the assignment that says to “read” the list of numbers.
Jussi Piitulainen said
November 12, 2013 at 5:34 PM
Scheme needs the arg min and arg max that some machine learning communities or some such use so much. Something like the following.
(define (argmin proc args)   (let loop ((least (proc (car args)))              (arg (car args)) (args (cdr args)))     (if (null? args) arg         (let ((weight (proc (car args))))           (if (< weight least)               (loop weight (car args) (cdr args))               (loop least arg (cdr args)))))))
To be used where Python would have the key for min, as I did above.
(let ((numbers '(3 -1 4 1 -5 9)))   (display "Numbers: ") (write numbers) (newline)   (display "Absolutely least: ") (write (argmin abs numbers)) (newline))
(let ((strings '("three" "one" "four" "one" "five" "nine")))   (display "Strings: ") (write strings) (newline)   (display "Shortest: ") (write (argmin string-length strings)) (newline))
That writes the following.
Numbers: (3 -1 4 1 -5 9) Absolutely least: -1 Strings: ("three" "one" "four" "one" "five" "nine") Shortest: "one"
The present problem model solution would be modified like so:
(define (min-hamm xs) (argmin hamming (pairings xs)))
The example would then show the minimum distance pair instead of the minimum distance.
> (min-hamm '(13500 1935 9645 5790)) (13500 9645)

David said

November 13, 2013 at 3:02 AM

FORTH version:

-1 1 rshift constant MAXINT

: bitcount ( n -- n )
    dup $5555555555555555 and  swap $AAAAAAAAAAAAAAAA and  1 rshift +
    dup $3333333333333333 and  swap $CCCCCCCCCCCCCCCC and  2 rshift +
    dup $0F0F0F0F0F0F0F0F and  swap $F0F0F0F0F0F0F0F0 and  4 rshift +
    dup $00FF00FF00FF00FF and  swap $FF00FF00FF00FF00 and  8 rshift +
    dup $0000FFFF0000FFFF and  swap $FFFF0000FFFF0000 and 16 rshift +
    dup $00000000FFFFFFFF and  swap $FFFFFFFF00000000 and 32 rshift + ;

: hamming ( n1 n2 -- n )
    xor bitcount ;

: minpair ( adr u -- a b )
    MAXINT dup dup locals| min-h min-a min-b count array |
    count 0 DO
        count i 1+ ?DO
            array i cells + @  array j cells + @  2dup hamming
            dup min-h < IF
                to min-h  to min-a  to min-b
            ELSE
                3drop
            THEN
        LOOP
    LOOP min-a min-b ;

create test 13500 , 1935 , 9645 , 5790 ,
test 4 minpair . . cr
9645 13500

Adolfo Pérez Álvarez (@xardon) said

November 13, 2013 at 7:55 AM

In Smalltalk:

ns := #(1 2 3 4).

(ns collect: [ :x | x bitString ])
	unorderedPairs
	minimize: [ :p | (p at: 1) hamming: (p at: 2) ])
		collect: [ :p | Number readFrom: p base: 2 ].

Each number is transformed into its equivalent bit string and paired with each other without repetition:

SequenceableCollection >> unorderedPairs
	|n|
	n := self size.
	n < 2 ifTrue: [ self error ].
	^ (1 to: n - 1) flatCollect: [ :i |
		  (i + 1 to: n) collect: [ :j |
			  { self at: i . self at: j } ] ]

The resulting collection of pairs is minimized with respect to the hamming function:

Collection >> minimize: aBlock
	|x fx s|
	s := ReadStream on: self.
	x := s next.
	fx := aBlock value: x.
	
	s do: [ :y |
		|fy|
		fy := aBlock value: y.
		fy < fx ifTrue: [ x := y. fx := fy ] ].
	^ x
	
String >> hamming: aString
	^ self with: aString inject: 0 into:[ :a :b :x | a = b ifTrue: x ifFalse: 1 + x ]

SequenceableCollection >> with: aCollection inject: initial into: aBlock 
	|acc|
	acc := initial.
	(1 to: (aCollection size min: self size)) do: [ :i |
		acc := aBlock value: (self at: i) value: (aCollection at: i) value: acc ].
	^ acc

Graham said

November 14, 2013 at 4:04 AM

Here’s an implementation in Python. The problem statement seems a
bit ambiguous to me; it doesn’t specify that the pairs should be of
two distinct numbers. Hence the snarky solution is to return zero,
the distance from the first number in your list to itself.

#!/usr/bin/env python3
from itertools import combinations, starmap

def snarky(ns):
    """The minimum distance between arbitrary pairs is zero (there was no
    stipulation that the pairs must be of distinct items; dist(ns[0],
    ns[0]) = 0).
    """
    return 0

def hamming_distance(m, n):
    """Hamming Distance between two natural numbers is the number of ones
    in the binary representation of their xor.
    """
    return sum(map(int, bin(m ^ n)[2:]))

def min_hamming_dist(ns):
    """Minimum Hamming Distant between pairs of two different ns"""
    return min(starmap(hamming_distance, combinations(ns, 2)))

if __name__ == '__main__':
    print(hamming_distance(79, 83))
    ns = [13500, 1935, 9645, 5790]
    print(snarky(ns))
    print(min_hamming_dist(ns))

Jussi Piitulainen said
November 14, 2013 at 7:05 AM
Re snarky, I think it’s reasonably clear that the numbers need to be distinct, because they need to be two: “The assignment was to read a list of numbers and find the two with the minimum Hamming distance between them.”

(And, AFAICS, there’s still no candidate solution that reads the numbers from somewhere. We are all just putting test numbers in the source code. We are all failing that part of the assignment. And I agree with us :)
cosmin said
November 14, 2013 at 10:09 AM
The problem is very interesting and remembers me of a similar one that asks to find the closest pair of 2D points, that can be solved in O(n*log(n)). Despite the fact that in both cases we have metrics, Hamming and euclidean, in this current case we have a possibly 32-dimensional space with a different structure, if we assume that all the numbers are 32-bit integers. Probably there are more subtle differences, but I’m still wondering if this problem can be solved better than Theta(n^2). An interesting idea that crossed my mind was to use BK-trees, which is a very nice data structure which I think fits perfectly here, but still it is hard to evaluate the complexity of the queries.
Jussi Piitulainen said
November 14, 2013 at 10:44 AM
@cosmin: While 32-bit, 2’s-complement integers are a possible space, there’s nothing in the problem statement to indicate that it was intended. Taking that to be implicit is a bit like assuming that all text is in ASCII unless otherwise stated. In both cases, an explicit assumption is fine (and in this case needed if you want to include negative numbers).

The complexity question should be interesting.
treeowl said
December 22, 2013 at 7:09 PM
I’m thinking there could be a radixsort-inspired approach to speeding things up. Suppose n numbers are represented with k bits each. Assume n>=2 throughout.
Base step: If k=1, then the minimum Hamming distance (mindist) and a minimal pair (minpair) can be found in a trivial amount of time (If n>=3, then mindist is 0, and minpair can be found in at most two comparisons. If n=2, then minpair takes no more work and mindist takes one comparison).
Recursive step: Spray the numbers into 2k bins, with all numbers whose i’th bit is 0 going into bin (i,0) and those whose i’th bit is 1 going into bin (i,1), the i’th bit stripped out with a couple bitwise operations. This spraying operation is clearly O(nk). Check if all bins contain at most one element (O(k)). If so, choose the contents of any two as a minimal pair and report their Hamming distance. If not, discard all bins with less than two elements. Use the algorithm recursively to find the minimal pair and distance in each bin. Then choose the minimal one from among these, which is O(k).

The most expensive operation in the recursive step is the spraying itself. This takes O(nk) at the top level. At the jth level we have (at most) something vaguely like 2^j k (k-1) … (k-i) bins, into which we’re sorting something vaguely like k(k-1)…(k-i)n elements. The total should be something vaguely like O(2^k (k!)^2 n), I believe, but I haven’t done a careful calculation, so I could be *completely* wrong. Also, if my estimate is an upper bound, there may be a modification that will give a slightly better one: when n gets sufficiently large compared to k, it eventually gets *easier* to find the minimum Hamming distance, and may or may not get easier to find a minimum pair. Note that (to take an extreme example) if n>2^k then the Hamming distance *must* be 0.
treeowl said
December 22, 2013 at 7:50 PM
Forget what I wrote above—it’s awful.
treeowl said
December 22, 2013 at 8:42 PM
A recent paper on this very topic: http://www.singacom.uva.es/~edgar/cactc2012/trabajos/CACT2012_Villa.pdf
Minimum Hamming Distance in Common Lisp | Regular Coder said
January 11, 2014 at 10:51 AM
[…] Problem statement from Programming Praxis: […]

Programming Praxis