An Array Of Two Symbols

March 12, 2013

I found today’s exercise on a discussion forum for learning programmers. It looks to me like homework, but the posting was old enough that I don’t think we have to worry about doing someone’s homework for them — unless the teacher reuses the same problems from one year to the next.

You are given an array of length n that is filled with two symbols; all m copies of one symbol appear first, at the beginning of the array, followed by all nm copies of the other symbol. You are to find the index of the first copy of the second symbol, m, in time O(log m).

The obvious solution is binary search. Pick the midpoint of the array; if the item there is the first symbol, search recursively in the right half of the array, otherwise search recursively in the left half of the array, stopping when the sub-array has been reduced to a single item. But that takes time O(log n), which violates the conditions of the exercise.

Your task is to write the O(log m) solution. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

14 Responses to “An Array Of Two Symbols”

  1. […] today’s exercise, our goal is to write a function that, given a list consisting of m of one symbol followed […]

  2. My Haskell solution (see http://bonsaicode.wordpress.com/2013/03/12/programming-praxis-an-array-of-two-symbols/ for a version with comments):

    import qualified Data.Vector as V
    import Test.QuickCheck
    
    search :: V.Vector Char -> Int
    search xs = f 0 (V.length xs - 1) where
        f start end = case span ((xs V.! 0 ==) . (xs V.!)) . takeWhile (<= end) .
                           map (start +) $ 0 : iterate (*2) 1
                      of   ([_],[n]) -> n
                           (ms ,[] ) -> f (last ms) end
                           (ms ,n:_) -> f (last ms) n
    

    On a side note, there’s a typo in the description. The complexity for the binary search should be O(log n), not O(log m).

  3. cosmin said

    A python solution that computes the binary representation of m, and then returns m+1. The algorithm determines sequentially the most significant bit, then the second and so on, constructing 2^k1, 2^k1 + 2^k2, … , 2^k1 + 2^k2 + … + 2^kt = m

    def split(a):
    	pow2, n = 1, len(a)
    	while pow2 < n and a[0] == a[pow2]:
    		pow2 <<= 1
    	pow2 >>= 1
    	index = pow2
    	while pow2 > 0:
    		if index + pow2 < n and a[0] == a[index + pow2]:
    			index += pow2
    		pow2 >>= 1
    	return index + 1
    
    print split([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1])
    
  4. Joachim said

    The typo in the text makes it quite confusing.

    > You are to find the index of the first copy of the second symbol, m, in time O(log m).

    > But that takes time O(log m), which violates the conditions of the exercise.

  5. programmingpraxis said

    Fixed. Sorry about that.

  6. howlettk said

    i have done it with php:

  7. howlettk said

    i have dione it with php:

  8. howlettk said

    i pasted the code so why didnt it show up?
    anyway, here it is:

  9. pfernie said

    Seems like there is a bug in the solution, in the call into the first loop:


    (loop hi (max (+ hi hi) (- len 1)))

    This should be a call to (min), otherwise on the first iteration it checks (0 1), but the next jumps directly to the case (1 len – 1), as can be seen from the sample output:

    > (f '#(0 0 1 1 1 1))
    first loop: 0 1
    first loop: 1 5 // We blew right past 2...
    second loop: 1 3 5
    second loop: 1 2 3
    second loop: 1 1 2
    second loop: 2 2 2
    2

    We should expect:

    first loop: 0 1
    first loop: 1 2
    second loop: 1 1 2
    second loop: 2 2 2
    2

  10. This is simple: find a lower bound for min in O(log2(m)), then apply a dichotomy(min,max) which will be in O(log2(m)) too, for a total of O(log2(m)).


    (defun find-in-log-m (v)
    (let* ((max (1- (length v)))
    (symbol-m (aref v max)))
    (labels ((bound-in-log-m (i)
    (if (eql symbol-m (aref v (- max i)))
    (bound-in-log-m (* 2 i))
    i)))
    (let ((min (bound-in-log-m 1)))
    (1+ (nth-value 1 (com.informatimago.common-lisp.cesarum.utility:dichotomy
    (lambda (index)
    (cond
    ((eql symbol-m (aref v index)) -1)
    ((eql symbol-m (aref v (1+ index))) 0)
    (t +1)))
    min max)))))))

    (find-in-log-m #(a a a a a a a a a a a a b b b b b b b b b b b b b b b b))
    --> 17

  11. programmingpraxis said

    pfernie: Fixed. That’s embarrassing.

  12. […] Second symbol starting index Array of two symbol […]

Leave a comment