Amazon Interview Question
November 27, 2012
We have today another question from our inexhaustible set of interview questions; this one comes from Amazon:
Given a million points (x, y), give an O(n) solution to find the 100 points closest to (0, 0).
Your task is to write the requested function. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
[…] today’s Programming Praxis exercise, our goal is to find the 100 points closest to (0, 0) out of a list of […]
My Haskell solution (see http://bonsaicode.wordpress.com/2012/11/27/programming-praxis-amazon-interview-question/ for a version with comments):
Why not insert into a min heap, and limit its size to 100 by removing the top item on insertion having reached the limit? That should make it linear, right?
A Python version. To my opinion this is linear, as long as n is much smaller than the length of the xypoints array. I checked it with N (length of xypoints) equal to 100000 and with values for n of 10 and 100. The time taken is about the same.
What do you think?
Because 100 is a factor, even if you use insert sort (in an already sorted array since you’re building it), the complexity of the algorithm will still be O(n), no ?
Actually the problem can be solved in O(n) no matter how big k is (here k = 100). This problem is the 2D version of finding the first k smallest elements. Below is an algorithm that uses the Quicksort’s idea of splitting the list in 2 sub-lists, but the difference is that only one recursion is used. Due to the randomization the complexity is about T(n) = T(n/2) + O(n) which has the solution T(n) = O(n). Still the worst case could be O(n^2) but this is highly improbable. Another key points that are worth mentioning:
[1] If a min-heap (not necessarily with the size fixed to k) is used the complexity would be O(n + k*log(n)) (O(n) for building the heap and O(k*log(n)) for k extractions) But if k < n/log(n) (which is the case here) the complexity will still be O(n).
[2] There is another algorithm that splits the list in buckets of 5 elements that has the complexity O(n) in the worst case scenario. A starting point in understanding that algorithm could be found http://en.wikipedia.org/wiki/Selection_algorithm
If, indeed, the question was phrased like “given a million points (x, y), give an O(n) solution to find the 100 points closest to (0, 0),” it was a trick.
One thing is that “n” is not defined, which calls for making an assumption or, better yet, changing the question before attempting to answer it.
Another trick, is understanding the math of big-O: if you limit in advance the total number of points, there are no variables affecting the time of calculation, making any valid solution O(1), as in “runs no longer than predetermined time”.
Once you add variables to the task, you can start talking about big-O. “Given n points (x, y), give an O(n) solution to find the 100 points closest to (0, 0)” – now that question is different, since you may be given 5, or a million, or 10^100, or any other number of points, and you want an algorithm that would always run no longer than (constant seconds * n).
What you assumed originally, was yet another question: “Given n points (x, y), give an O(n) solution to find k points closest to (0, 0)”. That was a misunderstanding leading to the “cheating” feeling.
My take on this in Java:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Main.java
hosted with ❤ by GitHub
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Point.java
hosted with ❤ by GitHub
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ReversePointComparator.java
hosted with ❤ by GitHub
The worst case O(n) solution using the algorithm described in http://www.cse.ust.hk/~dekai/271/notes/L05/L05.pdf :
[…] Pages: 1 2 […]
Wow! This question generated lots of interest. Page hits have been about half again greater than normal for both of the last two days.
One of the questions raised in the comments is about the time complexity of the suggested solution. It is indeed O(n log k), not O(n), so it’s not linear. Just because k is very much smaller than n doesn’t change the theoretical time complexity, though in practice the extra factor of log k may not be noticeable.
It looks like I won’t pass the Amazon interview, as my algorithm was not linear and thus failed to answer the question. A linear algorithm is possible by using select to find the k’th item, as a couple of people have pointed out. A previous exercise gave an expected-time linear algorithm for select based on randomization; on Friday, we’ll look at a guaranteed-time linear algorithm for select based on median-of-medians.
Thanks to all who responded.
The question can be approached as follows
Given the list of numbers we can find the corresponding distance from the origin. While computing the distance from the origin we also keep the track on number of points(x, y) that have the distance between a < s <= b. Suppose the distance of a point(x, y) came out to be a value A < S <= B, then we increment the count corresponding to Bth index of the array(where we are keeping the track of points) by one. At the end of the exercise when all the point are processed we will have the total count of points with distance a < s <= b at index b of the count array. Using the count array we can then find the cumulative number of points with distance s = 100. The index where sum exceeds or equals 100 gives the radii of the circle which marks the boundary for the first 100 points that lie closest to the origin. Now another linear traversal on the original points list could be done to include all those points which have distance from the origin d <= radii, till the number of points in the list < 100.
The time complexity of the algorithm would be 2*O(N) + O(M), N = number of points, M = distance of a point that lie farthest from the origin, in worst case.
O(M) additional space complexity.
Sort the points by distance from the origin in linear time using radix sort.
Here’s my solution in Python:
It has two FOR loops, but one is only for returning k points (100 for the example). The FOR loop that cycles through the million points doesn’t use any logic – it only creates the hypotenuse for sorting to find the closest points.
The post directly above mine that does a sort can hardly be considered O(n) as the sort is going to be O(n log n) complex.
Ha nice – came up with the same solution present in page 2: forgo computing the roots, and use a max heap.
I also went further an created a special bin heap operation called SwapTop, which Pops the top (max) element and inserts a new element at worst O(log n) time, where n = 100.
Here is my c# solution:
[…] from an array of integers find 10 numbers closest to a given number amazon […]