An Array Of Two Symbols

March 12, 2013

The solution is to start from the left end instead of the middle, doubling the size of the search sub-array at each step: first from 0 to 1, then from 1 to 2, then from 2 to 4, then from 4 to 8, and so on. That will take O(log m) doublings. Once i doublings have occurred and the first copy of the second symbol is bracketed between the 2ith array element and the 2i+1th, the sub-array will be no larger than m, so binary search on the sub-array will find the desired item in O(log m) time, giving a combined O(log m) for the entire solution. Here’s the code, where we use 0 and 1 as the two symbols; beware that it is harder than it looks to get this code right:

(define (f vec)
  (let ((len (vector-length vec)))
    (let loop ((lo 0) (hi 1))
      (when verbose? (display "first loop: ")
        (display lo) (display " ") (display hi) (newline))
      (if (zero? (vector-ref vec hi))
          (loop hi (min (+ hi hi) (- len 1)))
          (let loop ((lo lo) (hi hi))
            (let ((mid (quotient (+ lo hi) 2)))
              (when verbose? (display "second loop: ")
                (display lo) (display " ") (display mid)
                (display " ") (display hi) (newline))
              (cond ((= lo hi) hi)
                    ((zero? (vector-ref vec mid))
                      (loop (+ mid 1) hi))
                    (else (loop lo mid)))))))))

We assume that the form of the input vector is correct, that it has at least two items, including at least one 0 and one 1, and that all the 0 items appear before all the 1 items. Here’s an example:

> (set! verbose? #t)
> (f '#(0 0 1 1 1 1))
first loop: 0 1
first loop: 1 5
second loop: 1 3 5
second loop: 1 2 3
second loop: 1 1 2
second loop: 2 2 2

The code here is tricky, with several places where it is easy to be off-by-one or to branch the wrong way on equality (yes, I stumbled into one of them while writing this code). Because of that, the code includes debugging output that can be turned on by setting the global variable verbose? to #t; it is instructive to turn on the verbose output and watch the code run. We also wrote a test program that runs a large number of tests, writing any failures or returning the null list for success:

(define (test n)
  (list-of (list vec result)
    (x range 1 n)
    (y range 1 n)
    (vec is (list->vector (append (make-list x 0) (make-list y 1))))
    (result is (f vec))
    (or (not (= (vector-ref vec result) 1))
        (not (= (vector-ref vec (- result 1)) 0)))))

> (set! verbose? #f)
> (test 100)

My own experience is instructive. I ran f by hand for several test cases, and thought all was well. Then I wrote the test program, and threw an exception that caused the test to halt midway through. Not seeing the problem, I added the debugging code, and saw that in the problem case lo was greater than hi, which should be impossible. Once I knew what was wrong, the problem was easy to fix, but I would never have found it without the debugging output. Finally, I decided to leave both the test script and the debugging code in the final version of the program, so I could show it to you, and to satisfy a gnawing sense that there might still be some remaining problem.

You can run the program at

Pages: 1 2

14 Responses to “An Array Of Two Symbols”

  1. […] today’s exercise, our goal is to write a function that, given a list consisting of m of one symbol followed […]

  2. My Haskell solution (see for a version with comments):

    import qualified Data.Vector as V
    import Test.QuickCheck
    search :: V.Vector Char -> Int
    search xs = f 0 (V.length xs - 1) where
        f start end = case span ((xs V.! 0 ==) . (xs V.!)) . takeWhile (<= end) .
                           map (start +) $ 0 : iterate (*2) 1
                      of   ([_],[n]) -> n
                           (ms ,[] ) -> f (last ms) end
                           (ms ,n:_) -> f (last ms) n

    On a side note, there’s a typo in the description. The complexity for the binary search should be O(log n), not O(log m).

  3. cosmin said

    A python solution that computes the binary representation of m, and then returns m+1. The algorithm determines sequentially the most significant bit, then the second and so on, constructing 2^k1, 2^k1 + 2^k2, … , 2^k1 + 2^k2 + … + 2^kt = m

    def split(a):
    	pow2, n = 1, len(a)
    	while pow2 < n and a[0] == a[pow2]:
    		pow2 <<= 1
    	pow2 >>= 1
    	index = pow2
    	while pow2 > 0:
    		if index + pow2 < n and a[0] == a[index + pow2]:
    			index += pow2
    		pow2 >>= 1
    	return index + 1
    print split([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1])
  4. Joachim said

    The typo in the text makes it quite confusing.

    > You are to find the index of the first copy of the second symbol, m, in time O(log m).

    > But that takes time O(log m), which violates the conditions of the exercise.

  5. programmingpraxis said

    Fixed. Sorry about that.

  6. howlettk said

    i have done it with php:

  7. howlettk said

    i have dione it with php:

  8. howlettk said

    i pasted the code so why didnt it show up?
    anyway, here it is:

  9. pfernie said

    Seems like there is a bug in the solution, in the call into the first loop:

    (loop hi (max (+ hi hi) (- len 1)))

    This should be a call to (min), otherwise on the first iteration it checks (0 1), but the next jumps directly to the case (1 len – 1), as can be seen from the sample output:

    > (f '#(0 0 1 1 1 1))
    first loop: 0 1
    first loop: 1 5 // We blew right past 2...
    second loop: 1 3 5
    second loop: 1 2 3
    second loop: 1 1 2
    second loop: 2 2 2

    We should expect:

    first loop: 0 1
    first loop: 1 2
    second loop: 1 1 2
    second loop: 2 2 2

  10. This is simple: find a lower bound for min in O(log2(m)), then apply a dichotomy(min,max) which will be in O(log2(m)) too, for a total of O(log2(m)).

    (defun find-in-log-m (v)
    (let* ((max (1- (length v)))
    (symbol-m (aref v max)))
    (labels ((bound-in-log-m (i)
    (if (eql symbol-m (aref v (- max i)))
    (bound-in-log-m (* 2 i))
    (let ((min (bound-in-log-m 1)))
    (1+ (nth-value 1 (com.informatimago.common-lisp.cesarum.utility:dichotomy
    (lambda (index)
    ((eql symbol-m (aref v index)) -1)
    ((eql symbol-m (aref v (1+ index))) 0)
    (t +1)))
    min max)))))))

    (find-in-log-m #(a a a a a a a a a a a a b b b b b b b b b b b b b b b b))
    --> 17

  11. programmingpraxis said

    pfernie: Fixed. That’s embarrassing.

  12. […] Second symbol starting index Array of two symbol […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: