Amazon Interview Question

November 27, 2012

We have today another question from our inexhaustible set of interview questions; this one comes from Amazon:

Given a million points (x, y), give an O(n) solution to find the 100 points closest to (0, 0).

Your task is to write the requested function. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

About these ads

Pages: 1 2

16 Responses to “Amazon Interview Question”

  1. […] today’s Programming Praxis exercise, our goal is to find the 100 points closest to (0, 0) out of a list of […]

  2. My Haskell solution (see http://bonsaicode.wordpress.com/2012/11/27/programming-praxis-amazon-interview-question/ for a version with comments):

    import Control.Applicative
    import qualified Data.IntMap as I
    import System.Random
    
    closest :: Int -> [(Int, Int)] -> [(Int, Int)]
    closest n = take n . concat . I.elems . I.fromListWith (flip (++)) .
                map (\(x,y) -> (x*x+y*y, [(x,y)]))
    
    points :: IO [(Int, Int)]
    points = take 1000000 <$> liftA2 zip (randomRs (-1000, 1000) <$> newStdGen)
                                         (randomRs (-1000, 1000) <$> newStdGen)
    
    main :: IO ()
    main = print . closest 100 =<< points
    
  3. sbmassey said

    Why not insert into a min heap, and limit its size to 100 by removing the top item on insertion having reached the limit? That should make it linear, right?

  4. Paul said

    A Python version. To my opinion this is linear, as long as n is much smaller than the length of the xypoints array. I checked it with N (length of xypoints) equal to 100000 and with values for n of 10 and 100. The time taken is about the same.

    def nsmallest(xypoints, n):
        """Use a heap with the maximum of the n lowest at the top of the heap"""
        # first fill the heap with the first n points
        heap = [(-hypot(x, y), (x, y)) for x, y in xypoints[:n]]
        heapify(heap)
        # now push new elements on heap only if distance small enough
        for x, y in xypoints[n:]:
            dist = hypot(x, y)
            if dist < -heap[0][0] :
                heappushpop(heap, (-dist, (x, y))) # remove "worst"
        return heap
    
  5. kawas44 said

    What do you think?
    Because 100 is a factor, even if you use insert sort (in an already sorted array since you’re building it), the complexity of the algorithm will still be O(n), no ?

  6. cosmin said

    Actually the problem can be solved in O(n) no matter how big k is (here k = 100). This problem is the 2D version of finding the first k smallest elements. Below is an algorithm that uses the Quicksort’s idea of splitting the list in 2 sub-lists, but the difference is that only one recursion is used. Due to the randomization the complexity is about T(n) = T(n/2) + O(n) which has the solution T(n) = O(n). Still the worst case could be O(n^2) but this is highly improbable. Another key points that are worth mentioning:

    [1] If a min-heap (not necessarily with the size fixed to k) is used the complexity would be O(n + k*log(n)) (O(n) for building the heap and O(k*log(n)) for k extractions) But if k < n/log(n) (which is the case here) the complexity will still be O(n).

    [2] There is another algorithm that splits the list in buckets of 5 elements that has the complexity O(n) in the worst case scenario. A starting point in understanding that algorithm could be found http://en.wikipedia.org/wiki/Selection_algorithm

    from random import randint
    from math import sqrt
    
    def dist(point):
    	return sqrt(point[0]**2 + point[1]**2)
    
    def k_smallest(k, a):
    	if k == len(a): return a
    	splitElm = a[randint(0, len(a) - 1)]
    	a1 = [e for e in a if dist(e) <= dist(splitElm)]
    	a2 = [e for e in a if dist(e) > dist(splitElm)]
    	if k <= len(a1):
    		return k_smallest(k, a1)
    	else:
    		return a1 + k_smallest(k - len(a1), a2)
    
    print k_smallest(3, [(0, 0), (1, 7), (2, 2), (3, 2), (1, 4), (3, 0)])
    
  7. Andrey said

    If, indeed, the question was phrased like “given a million points (x, y), give an O(n) solution to find the 100 points closest to (0, 0),” it was a trick.

    One thing is that “n” is not defined, which calls for making an assumption or, better yet, changing the question before attempting to answer it.

    Another trick, is understanding the math of big-O: if you limit in advance the total number of points, there are no variables affecting the time of calculation, making any valid solution O(1), as in “runs no longer than predetermined time”.

    Once you add variables to the task, you can start talking about big-O. “Given n points (x, y), give an O(n) solution to find the 100 points closest to (0, 0)” – now that question is different, since you may be given 5, or a million, or 10^100, or any other number of points, and you want an algorithm that would always run no longer than (constant seconds * n).

    What you assumed originally, was yet another question: “Given n points (x, y), give an O(n) solution to find k points closest to (0, 0)”. That was a misunderstanding leading to the “cheating” feeling.

  8. razvan said

    My take on this in Java:

  9. cosmin said

    The worst case O(n) solution using the algorithm described in http://www.cse.ust.hk/~dekai/271/notes/L05/L05.pdf :

    from math import sqrt, ceil
    
    def norm(point):
    	return sqrt(point[0]**2 + point[1]**2)
    
    def k_smallest(k, a, f):
    	n, medians = len(a), []
    	# solve the particular cases
    	if k == n: return a[:]
    	if n <= 5: return sorted(a, key=f)[:k]
    	# split the list in n/5 buckets and compute their medians
    	for i in xrange( int(ceil(n / 5.)) ):
    		start, end = 5*i, min(5*(i + 1), n)
    		b = sorted(a[start:end], key=f)
    		medians.append(b[(end - start) // 2])
    	# find the median of the medians using recursivity with k=n/2
    	splitElm = max(k_smallest(n // 2, medians, f))
    	# split the list in 2 sub-lists picking splitElm as a pivot 
    	a1 = [e for e in a if f(e) < f(splitElm)]
    	a2 = [e for e in a if f(e) >= f(splitElm)]
    	# use again the recursivity on one of the sub-lists to solve the problem
    	if k <= len(a1):
    		return k_smallest(k, a1, f)
    	else:
    		return a1 + k_smallest(k - len(a1), a2, f)
    
    print k_smallest(3, [(0, 0), (1, 7), (2, 2), (3, 2), (1, 4), (3, 0)], norm)
    
  10. programmingpraxis said

    Wow! This question generated lots of interest. Page hits have been about half again greater than normal for both of the last two days.

    One of the questions raised in the comments is about the time complexity of the suggested solution. It is indeed O(n log k), not O(n), so it’s not linear. Just because k is very much smaller than n doesn’t change the theoretical time complexity, though in practice the extra factor of log k may not be noticeable.

    It looks like I won’t pass the Amazon interview, as my algorithm was not linear and thus failed to answer the question. A linear algorithm is possible by using select to find the k’th item, as a couple of people have pointed out. A previous exercise gave an expected-time linear algorithm for select based on randomization; on Friday, we’ll look at a guaranteed-time linear algorithm for select based on median-of-medians.

    Thanks to all who responded.

  11. HBK said

    The question can be approached as follows
    Given the list of numbers we can find the corresponding distance from the origin. While computing the distance from the origin we also keep the track on number of points(x, y) that have the distance between a < s <= b. Suppose the distance of a point(x, y) came out to be a value A < S <= B, then we increment the count corresponding to Bth index of the array(where we are keeping the track of points) by one. At the end of the exercise when all the point are processed we will have the total count of points with distance a < s <= b at index b of the count array. Using the count array we can then find the cumulative number of points with distance s = 100. The index where sum exceeds or equals 100 gives the radii of the circle which marks the boundary for the first 100 points that lie closest to the origin. Now another linear traversal on the original points list could be done to include all those points which have distance from the origin d <= radii, till the number of points in the list < 100.

    The time complexity of the algorithm would be 2*O(N) + O(M), N = number of points, M = distance of a point that lie farthest from the origin, in worst case.
    O(M) additional space complexity.

  12. Rob Mayoff said

    Sort the points by distance from the origin in linear time using radix sort.

  13. Brian said

    Here’s my solution in Python:

    import math
    from operator import itemgetter
    #   Accepts point set n and closest number of points k
    #   e.g. n=[[3,4],[-1,2],[4,4],[-5,-2],[1,-3]]
    def O(n,k):
        i=[]
        closest=[]
        for point in n:
            dist=math.sqrt(point[0]**2+point[1]**2)
            i.append((point[0],point[1],dist))
        i=sorted(i,key=itemgetter(2))
        for x in range(0,k):
            closest.append([i[x][0],i[x][1]])
        return closest
    

    It has two FOR loops, but one is only for returning k points (100 for the example). The FOR loop that cycles through the million points doesn’t use any logic – it only creates the hypotenuse for sorting to find the closest points.

  14. Philip G said

    The post directly above mine that does a sort can hardly be considered O(n) as the sort is going to be O(n log n) complex.

  15. brooknovak said

    Ha nice – came up with the same solution present in page 2: forgo computing the roots, and use a max heap.
    I also went further an created a special bin heap operation called SwapTop, which Pops the top (max) element and inserts a new element at worst O(log n) time, where n = 100.
    Here is my c# solution:

    public static List<int[]> GetClosestPoints (IEnumerable<int[]> points, int x, int y, int k)
    {
    	if (points == null)
    		throw new ArgumentNullException ("points");
    	if (k < 0)
    		throw new ArgumentOutOfRangeException ("n");
    	if (k == 0)
    		return new List<int[]> ();
    
    	var maxheap = new BinHeap<XYPoint>(false);
    	var enumerator = points.GetEnumerator ();
    
    	for (int i = 0; i < k; i++) {
    		if (!enumerator.MoveNext ())
    			return maxheap.Select (pt => pt.Points).ToList ();
    		var p = enumerator.Current;
    		long sqDist = ((long)(p[X] - x) * (long)(p[X] - x)) + ((long)(p[Y] - y) * (long)(p[Y] - y));
    		maxheap.Insert (new XYPoint(p, sqDist)); // O(Log k)
    	}
    
    	while (enumerator.MoveNext()) {
    		var p = enumerator.Current;
    		long sqDist = ((long)(p[X] - x) * (long)(p[X] - x)) + (long)((p[Y] - y) * (long)(p[Y] - y));
    		if (sqDist < maxheap.Peek ().SquaredDist) {
    			maxheap.SwapTop (new XYPoint(p, sqDist)); // O(Log k)
    		}
    	}
    
    	return maxheap.Select (pt => pt.Points).ToList ();
    }
    
    private struct XYPoint : IComparable {
    	public XYPoint(int[] points, long squareDist) {
    		Points = points;
    		SquaredDist = squareDist;
    	}
    	public int[] Points;
    	public long SquaredDist;
    
    	public int CompareTo(object other)
    	{
    		if (other == null) return 1; 
    		return SquaredDist.CompareTo(((XYPoint)other).SquaredDist);
    	}
    }
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 635 other followers

%d bloggers like this: