Closest Two-Sum To Zero

June 23, 2015

Given a random array of integers, both positive and negative, find the pair with sum closest to zero. For instance, in the array [45, -29, -96, -7, -17, 72, -60], the two integers with sum closest to zero are -60 and 72.

Your task is to write a program that finds the two-sum closest to zero. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

15 Comments »

15 Responses to “Closest Two-Sum To Zero”

klettier said

June 23, 2015 at 12:58 PM

let d = [|45; -29; -96; -7; -17; 72; -60|]

let sum x y = x + y

let rec closestToZero (ar : int array) start (a1 : int) (a2 : int) (sm1: int) =
    match start with
    | x when x < ar.Length - 1 ->
        let s = ar.[start]
        let mutable a3, a4, sm2 = a1, a2, sm1
        for i in [start + 1 .. ar.Length - 1] do
            let smt = sum s ar.[i]
            if(System.Math.Abs(smt) < System.Math.Abs(sm2)) then
                sm2 <- smt
                a3 <- s
                a4 <- ar.[i]
        closestToZero ar (start + 1) a3 a4 sm2
    | _ -> (a1, a2, sm1)


closestToZero d 0 0 0 System.Int32.MaxValue;;

graham said
June 23, 2015 at 2:34 PM
I think the expected minimum sum is zero, since you’re picking k independent uniformly from [-n, n].
I’m too under-caffeinated to properly work out the full distribution, though. As for “is that normal?”, I
think you may be falling prey to small sample size; with only 10 trials, it’s hard to get an idea of what
the distribution looks like.

Here’s an implementation in vanilla Python:
```
def sort_and_scan(xs):
    ys = sorted(xs, key=abs)
    return min(map(sum, zip(ys, ys[1:])), key=abs)
```
And for conducting a large number of experiments, we turn to the excellent numpy and pandas
libraries:
```
from numpy.random import randint
from pandas import DataFrame

def experiment(k, n, t):
    df = DataFrame(randint(low=-n, high=n + 1, size=(k, t)))
    return df.apply(sort_and_scan).describe()

if __name__ == '__main__':
    print(experiment(17, 1000, int(1e5)))
```
In one run, I got a mean of 0.03161 with a standard deviation of 10.806607.
graham said
June 23, 2015 at 2:52 PM
Thinking a bit more about the distribution, it makes sense that the expected min sum would be zero.
A sum of 2n is only possible if two different entries are both plus or minus n, but there are n different
ways to get a sum of zero: plus and minus x for any x in {0, …, n}. This distribution is also symmetric
about the middle.
programmingpraxis said
June 23, 2015 at 2:56 PM
@Graham: Does k matter? If k is very small, there won’t be very many ways to make 0. Is there some minimum k required to guarantee an expected min sum of 0?
graham said
June 23, 2015 at 3:05 PM
My intuition is that k doesn’t matter for the expected minimum sum to be zero; it should always
be zero. The variance probably plays a big role, though; if it’s relatively the same size as k, the
expected min sum may not be all that much more likely than other values. If I get a bit, I’ll take a
stab at the full conditional distribution.
Paul said
June 23, 2015 at 3:50 PM
If you look at the average of the absolute sums you get something close to 7.45. If you take the average of the sum, you come close to zero, of course. I have googled a little to find papers on problems like this, but did not find any yet.

Mike said

June 24, 2015 at 12:25 AM

My solution – sort the list and scan from each end:

def min_sum(y):
	y.sort()
	m = float("+inf")
	f, r = 0, len(y)-1
	while f < r:
		t = y[f] + y[r]
		if -m < t < m:
			m = abs(t)
			sf, sr = f, r
		if t < 0:
			f += 1
		elif t >= 0:
			r -= 1

	return y[sf], y[sr], m

Mike said
June 24, 2015 at 12:30 AM
The problem of finding the expected value of the minimum sum, feels like a “birthday paradox” type problem. In particular, the variation in which you have a room of m men and w women, and want to know the probability that a man and a woman share a birthday. E.g., consider men, women to be the random numbers zero, respectively, and their absolute value to be the birthday. See: https://en.wikipedia.org/wiki/Birthday_problem#Generalization_to_multiple_types
Mike said
June 24, 2015 at 12:37 AM
WordPress ate the less than and greater than symbols. I was trying to say let the random integers less than zero represent the men, and the random integers greater than zero represent women. Then the problem maps fairly well onto the case in the Wikipedia article. Not sure how to handle zero, maybe alternate so first zero is male, second is female, etc.
Mike said
June 25, 2015 at 3:24 AM
@praxis – In your code min-sum is the absolute value of the minimum sum, so your min-sum’s are all positive. I think that explains why your expected min-sum wasn’t what you expected.

In @Graham’s experiment, he used the actual minimum sum, which can be positive or negative. Therefor his expected value is 0.

This IPython notebook shows has more info: http://nbviewer.ipython.org/gist/anonymous/4abb2d282e72ba68cb5f

Brett Warren said

June 25, 2015 at 12:45 PM

Solution – generate array; compare each item with every item after it, and compare to the closest pair of integers.

def closest_to_zero(int_array):
	closest_to_zero = [float("inf")]
	for index, operand1 in enumerate(int_array):
		for operand2 in list(int_array[index+1:]):
			summation_ints = [operand1, operand2]
			if abs(sum(summation_ints)) < abs(sum(closest_to_zero)):
				closest_to_zero = summation_ints
	return closest_to_zero

if __name__ == "__main__":
	print(closest_to_zero([45, -29, -96, -7, -17, 72, -60]))

Paul said
June 26, 2015 at 5:12 AM
I think this problem is close to the problem to find the smallest distance between the points from an array. A nice explanation of the formula for that problem is given in MathOverflow.
We have m cards (in our case 2000) with k marked cards (17 in our case). If we deal them out in the following way we get a result where the cards have at least a given distance d. We will assume that cards dealt from the bottom are safe (not marked). Start to deal from the top. When a marked ard is dealt we deal d “safe” cards from the bottom. When all the cards are dealt we have the desired result, if the (k – 1) d cards at the bottom were indeed safe cards (not marked). The probabillity that no marked cards are in the bottom (k-1)d cards is the probability that we can deal cards with a minimum distance of d. From this you get a minimum distance value of L / (k^2-1), which is exact, if you let go m to infinity – L is the length of the interval (2000 in our case).
For our problem the situation is slightly different, but we can use the same argumentation. Our deal starts from positions 0, -1, +1, -2, +2, ….-2000, +2000. When we hit a marked card and the position is positive, we deal d cards on the negative side (we fill positions -pos-1, …-pos-d). When we hit a marked card and the position is negative, we deal d+1 cards on the positive side (we fill positions -pos, -pos-1, …,-pos-d). So on average we fill d+1/2 positions i.so. d positions for the minimum distance case. After that we can safely fill d cards at the same side (as the marked card) from the top. Tthere are 2 complications, which are not handled yet. The first is, that if the frist marked card is very early more safe cards have to be used, as a very small positive or a few very small negative numbers can result in a low sum too. The second complication is, that if after a marked card within d cards another marked card follows, the number of safe cards that have to be used the fill positions on the other side is less.
All in all, it is to be expected that the minimum distance problem is close to the closest to zero sum problem . As we have to deal slightly more “safe” cards for our problem, is is expected that the expectation value will be higher. As you can see from the table (m=2000) for all values of k the simulation value for our problem is in between the values for the minimum distance for k-1 and k. (all simulations were done with floats i.s.o. ints; the differences are negligable)
```
  k    |sum|     |dist|
  2 666.432 666.667
  3 333.362 250.000
  4 186.892 133.333
  5 114.160  83.333
  6  75.370  57.143
  7  52.583  41.667
  8  38.801  31.746
  9  29.588  25.000
 10  23.401  20.202
 11  18.884  16.667
 12  15.647  13.986
 13  13.189  11.905
 14  11.245  10.256
 15   9.688   8.929
 16   8.464   7.843
 17   7.449   6.944
 18   6.615   6.192
 19   5.900   5.556
 20   5.311   5.013
 29   2.472   2.381
 30   2.305   2.225
 40   1.284   1.251
 49   0.852   0.833
 50   0.818   0.800
  99   0.206   0.204
100   0.202   0.200
200   0.050   0.050
300   0.022   0.022
```
seth2810 said
June 26, 2015 at 2:53 PM
I think that the easiest way to solve the problem directly to iterate over all the elements

function closest(input) {
var v1, v2, sum, result;

for (var idx1 = input.length; idx1 >= 0; idx1–) {
for (var idx2 = input.length; idx2 >= 0; idx2–) {
if (idx1 != idx2) {
v1 = input[idx1];
v2 = input[idx2];
sum = v1 + v2;

if (sum >= 0 && (!result || result[2] > sum)) {
result = [v1, v2, sum];
}
}
}
}

return result;
}

var input = [45, -29, -96, -7, -17, 72, -60];
console.log(closest(input));

Meisam said

June 30, 2015 at 1:09 PM

i = 0
j = 1
arr =  [45, -29, -96, -7, -17, 72, -60]
sum = abs (arr[0]+arr[1])
while i < len(arr):
    while j < len(arr):
        if sum > abs (arr[i]+arr[j]):
            sum = abs (arr[i]+arr[j])
            a = arr[i]
            b = arr[j]
        j +=1
    i += 1
    j = i +1
print a,b, sum

r. clayton said
July 3, 2015 at 2:18 PM
A solution in (Racket) Scheme.

Programming Praxis