Closest Two-Sum To Zero
June 23, 2015
Given a random array of integers, both positive and negative, find the pair with sum closest to zero. For instance, in the array [45, -29, -96, -7, -17, 72, -60], the two integers with sum closest to zero are -60 and 72.
Your task is to write a program that finds the two-sum closest to zero. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
I think the expected minimum sum is zero, since you’re picking k independent uniformly from [-n, n].
I’m too under-caffeinated to properly work out the full distribution, though. As for “is that normal?”, I
think you may be falling prey to small sample size; with only 10 trials, it’s hard to get an idea of what
the distribution looks like.
Here’s an implementation in vanilla Python:
And for conducting a large number of experiments, we turn to the excellent numpy and pandas
libraries:
In one run, I got a mean of 0.03161 with a standard deviation of 10.806607.
Thinking a bit more about the distribution, it makes sense that the expected min sum would be zero.
A sum of 2n is only possible if two different entries are both plus or minus n, but there are n different
ways to get a sum of zero: plus and minus x for any x in {0, …, n}. This distribution is also symmetric
about the middle.
@Graham: Does k matter? If k is very small, there won’t be very many ways to make 0. Is there some minimum k required to guarantee an expected min sum of 0?
My intuition is that k doesn’t matter for the expected minimum sum to be zero; it should always
be zero. The variance probably plays a big role, though; if it’s relatively the same size as k, the
expected min sum may not be all that much more likely than other values. If I get a bit, I’ll take a
stab at the full conditional distribution.
If you look at the average of the absolute sums you get something close to 7.45. If you take the average of the sum, you come close to zero, of course. I have googled a little to find papers on problems like this, but did not find any yet.
My solution – sort the list and scan from each end:
The problem of finding the expected value of the minimum sum, feels like a “birthday paradox” type problem. In particular, the variation in which you have a room of m men and w women, and want to know the probability that a man and a woman share a birthday. E.g., consider men, women to be the random numbers zero, respectively, and their absolute value to be the birthday. See: https://en.wikipedia.org/wiki/Birthday_problem#Generalization_to_multiple_types
WordPress ate the less than and greater than symbols. I was trying to say let the random integers less than zero represent the men, and the random integers greater than zero represent women. Then the problem maps fairly well onto the case in the Wikipedia article. Not sure how to handle zero, maybe alternate so first zero is male, second is female, etc.
@praxis – In your code min-sum is the absolute value of the minimum sum, so your min-sum’s are all positive. I think that explains why your expected min-sum wasn’t what you expected.
In @Graham’s experiment, he used the actual minimum sum, which can be positive or negative. Therefor his expected value is 0.
This IPython notebook shows has more info: http://nbviewer.ipython.org/gist/anonymous/4abb2d282e72ba68cb5f
Solution – generate array; compare each item with every item after it, and compare to the closest pair of integers.
I think this problem is close to the problem to find the smallest distance between the points from an array. A nice explanation of the formula for that problem is given in MathOverflow.
We have m cards (in our case 2000) with k marked cards (17 in our case). If we deal them out in the following way we get a result where the cards have at least a given distance d. We will assume that cards dealt from the bottom are safe (not marked). Start to deal from the top. When a marked ard is dealt we deal d “safe” cards from the bottom. When all the cards are dealt we have the desired result, if the (k – 1) d cards at the bottom were indeed safe cards (not marked). The probabillity that no marked cards are in the bottom (k-1)d cards is the probability that we can deal cards with a minimum distance of d. From this you get a minimum distance value of L / (k^2-1), which is exact, if you let go m to infinity – L is the length of the interval (2000 in our case).
For our problem the situation is slightly different, but we can use the same argumentation. Our deal starts from positions 0, -1, +1, -2, +2, ….-2000, +2000. When we hit a marked card and the position is positive, we deal d cards on the negative side (we fill positions -pos-1, …-pos-d). When we hit a marked card and the position is negative, we deal d+1 cards on the positive side (we fill positions -pos, -pos-1, …,-pos-d). So on average we fill d+1/2 positions i.so. d positions for the minimum distance case. After that we can safely fill d cards at the same side (as the marked card) from the top. Tthere are 2 complications, which are not handled yet. The first is, that if the frist marked card is very early more safe cards have to be used, as a very small positive or a few very small negative numbers can result in a low sum too. The second complication is, that if after a marked card within d cards another marked card follows, the number of safe cards that have to be used the fill positions on the other side is less.
All in all, it is to be expected that the minimum distance problem is close to the closest to zero sum problem . As we have to deal slightly more “safe” cards for our problem, is is expected that the expectation value will be higher. As you can see from the table (m=2000) for all values of k the simulation value for our problem is in between the values for the minimum distance for k-1 and k. (all simulations were done with floats i.s.o. ints; the differences are negligable)
I think that the easiest way to solve the problem directly to iterate over all the elements
function closest(input) {
var v1, v2, sum, result;
for (var idx1 = input.length; idx1 >= 0; idx1–) {
for (var idx2 = input.length; idx2 >= 0; idx2–) {
if (idx1 != idx2) {
v1 = input[idx1];
v2 = input[idx2];
sum = v1 + v2;
if (sum >= 0 && (!result || result[2] > sum)) {
result = [v1, v2, sum];
}
}
}
}
return result;
}
var input = [45, -29, -96, -7, -17, 72, -60];
console.log(closest(input));
A solution in (Racket) Scheme.