## An Array Of Two Symbols

### March 12, 2013

The solution is to start from the left end instead of the middle, doubling the size of the search sub-array at each step: first from 0 to 1, then from 1 to 2, then from 2 to 4, then from 4 to 8, and so on. That will take *O*(log *m*) doublings. Once *i* doublings have occurred and the first copy of the second symbol is bracketed between the 2^{i}th array element and the 2^{i+1}th, the sub-array will be no larger than *m*, so binary search on the sub-array will find the desired item in *O*(log *m*) time, giving a combined *O*(log *m*) for the entire solution. Here’s the code, where we use 0 and 1 as the two symbols; beware that it is harder than it looks to get this code right:

`(define (f vec)`

(let ((len (vector-length vec)))

(let loop ((lo 0) (hi 1))

(when verbose? (display "first loop: ")

(display lo) (display " ") (display hi) (newline))

(if (zero? (vector-ref vec hi))

(loop hi (min (+ hi hi) (- len 1)))

(let loop ((lo lo) (hi hi))

(let ((mid (quotient (+ lo hi) 2)))

(when verbose? (display "second loop: ")

(display lo) (display " ") (display mid)

(display " ") (display hi) (newline))

(cond ((= lo hi) hi)

((zero? (vector-ref vec mid))

(loop (+ mid 1) hi))

(else (loop lo mid)))))))))

We assume that the form of the input vector is correct, that it has at least two items, including at least one 0 and one 1, and that all the 0 items appear before all the 1 items. Here’s an example:

`> (set! verbose? #t)`

> (f '#(0 0 1 1 1 1))

first loop: 0 1

first loop: 1 5

second loop: 1 3 5

second loop: 1 2 3

second loop: 1 1 2

second loop: 2 2 2

2

The code here is tricky, with several places where it is easy to be off-by-one or to branch the wrong way on equality (yes, I stumbled into one of them while writing this code). Because of that, the code includes debugging output that can be turned on by setting the global variable `verbose?`

to `#t`

; it is instructive to turn on the verbose output and watch the code run. We also wrote a test program that runs a large number of tests, writing any failures or returning the null list for success:

`(define (test n)`

(list-of (list vec result)

(x range 1 n)

(y range 1 n)

(vec is (list->vector (append (make-list x 0) (make-list y 1))))

(result is (f vec))

(or (not (= (vector-ref vec result) 1))

(not (= (vector-ref vec (- result 1)) 0)))))

```
```

`> (set! verbose? #f)`

> (test 100)

()

My own experience is instructive. I ran `f`

by hand for several test cases, and thought all was well. Then I wrote the `test`

program, and threw an exception that caused the test to halt midway through. Not seeing the problem, I added the debugging code, and saw that in the problem case *lo* was greater than *hi*, which should be impossible. Once I knew what was wrong, the problem was easy to fix, but I would never have found it without the debugging output. Finally, I decided to leave both the test script and the debugging code in the final version of the program, so I could show it to you, and to satisfy a gnawing sense that there might still be some remaining problem.

You can run the program at http://programmingpraxis.codepad.org/gq1oqTqk.

[…] today’s exercise, our goal is to write a function that, given a list consisting of m of one symbol followed […]

My Haskell solution (see http://bonsaicode.wordpress.com/2013/03/12/programming-praxis-an-array-of-two-symbols/ for a version with comments):

On a side note, there’s a typo in the description. The complexity for the binary search should be O(log n), not O(log m).

A python solution that computes the binary representation of m, and then returns m+1. The algorithm determines sequentially the most significant bit, then the second and so on, constructing 2^k1, 2^k1 + 2^k2, … , 2^k1 + 2^k2 + … + 2^kt = m

The typo in the text makes it quite confusing.

> You are to find the index of the first copy of the second symbol, m, in time O(log m).

> But that takes time O(log m), which violates the conditions of the exercise.

Fixed. Sorry about that.

i have done it with php:

i have dione it with php:

[…] Question is from here […]

My java solution here.

i pasted the code so why didnt it show up?

anyway, here it is:

Seems like there is a bug in the solution, in the call into the first loop:

(loop hi (max (+ hi hi) (- len 1)))

This should be a call to (min), otherwise on the first iteration it checks (0 1), but the next jumps directly to the case (1 len – 1), as can be seen from the sample output:

> (f '#(0 0 1 1 1 1))

first loop: 0 1

first loop: 1 5 // We blew right past 2...

second loop: 1 3 5

second loop: 1 2 3

second loop: 1 1 2

second loop: 2 2 2

2

We should expect:

first loop: 0 1

first loop: 1 2

second loop: 1 1 2

second loop: 2 2 2

2

This is simple: find a lower bound for min in O(log2(m)), then apply a dichotomy(min,max) which will be in O(log2(m)) too, for a total of O(log2(m)).

(defun find-in-log-m (v)

(let* ((max (1- (length v)))

(symbol-m (aref v max)))

(labels ((bound-in-log-m (i)

(if (eql symbol-m (aref v (- max i)))

(bound-in-log-m (* 2 i))

i)))

(let ((min (bound-in-log-m 1)))

(1+ (nth-value 1 (com.informatimago.common-lisp.cesarum.utility:dichotomy

(lambda (index)

(cond

((eql symbol-m (aref v index)) -1)

((eql symbol-m (aref v (1+ index))) 0)

(t +1)))

min max)))))))

(find-in-log-m #(a a a a a a a a a a a a b b b b b b b b b b b b b b b b))

--> 17

pfernie: Fixed. That’s embarrassing.

[…] Second symbol starting index Array of two symbol […]