Two Amazon Interview Questions
February 26, 2016
In the previous exercise I wrote a solution that doesn’t work, so I guess I have to try again with two more Amazon interview questions:
Create a set data structure that provides the following operations, which must all operate in constant time: insert, delete, member?, and random. The random operation should return a random element from the set.
and
Given an array, find the first element that appears an even number of times.
Your task is to answer the two interview questions. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Hash tables are not constant time. They’re amortized O(1), which is very different – try implementing a hard-realtime system with hash tables.
@Janne – I’d have thought it depends on the situation, if you either don’t need to resize because you know in advance the required table size or you spread the cost of resizing the table over many insertions, and if your hash function is perfect or in practice produces a good distribution, then most people would be happy calling that constant (or at least bounded) time.
Anyway, the suggested random selection seems to be randomly selecting a (non-empty) hash table bucket, then selecting an element in that bucket – this isn’t going to produce a uniform distribution. If you are going for linked list buckets, better to allocate the list nodes from a separate array, then randomly select from that array rather than from the hash bucket array.
Does random operate in constant time? Is not it O(n)?
The answer given here has a nice way of making a uniform selection:
http://stackoverflow.com/questions/8629447/efficiently-picking-a-random-element-from-a-chained-hash-table
If there are N buckets and L is the length of the longest hash table bucket chain, first randomly choose a non-empty bucket, then choose j from 0 to L-1 (inclusive), if bucket has a jth element in its chain, return that, otherwise choose another j.
Assuming items are spread randomly across the hash buckets, average runtime should be O(1).
I’m getting it wrong now: that should be, if there is no jth element in the bucket, start again by choosing another bucket.