An Array Of Two Symbols

March 12, 2013

The solution is to start from the left end instead of the middle, doubling the size of the search sub-array at each step: first from 0 to 1, then from 1 to 2, then from 2 to 4, then from 4 to 8, and so on. That will take O(log m) doublings. Once i doublings have occurred and the first copy of the second symbol is bracketed between the 2ⁱth array element and the 2ⁱ⁺¹th, the sub-array will be no larger than m, so binary search on the sub-array will find the desired item in O(log m) time, giving a combined O(log m) for the entire solution. Here’s the code, where we use 0 and 1 as the two symbols; beware that it is harder than it looks to get this code right:

(define (f vec) (let ((len (vector-length vec))) (let loop ((lo 0) (hi 1)) (when verbose? (display "first loop: ") (display lo) (display " ") (display hi) (newline)) (if (zero? (vector-ref vec hi)) (loop hi (min (+ hi hi) (- len 1))) (let loop ((lo lo) (hi hi)) (let ((mid (quotient (+ lo hi) 2))) (when verbose? (display "second loop: ") (display lo) (display " ") (display mid) (display " ") (display hi) (newline)) (cond ((= lo hi) hi) ((zero? (vector-ref vec mid)) (loop (+ mid 1) hi)) (else (loop lo mid)))))))))

We assume that the form of the input vector is correct, that it has at least two items, including at least one 0 and one 1, and that all the 0 items appear before all the 1 items. Here’s an example:

> (set! verbose? #t) > (f '#(0 0 1 1 1 1)) first loop: 0 1 first loop: 1 5 second loop: 1 3 5 second loop: 1 2 3 second loop: 1 1 2 second loop: 2 2 2 2

The code here is tricky, with several places where it is easy to be off-by-one or to branch the wrong way on equality (yes, I stumbled into one of them while writing this code). Because of that, the code includes debugging output that can be turned on by setting the global variable verbose? to #t; it is instructive to turn on the verbose output and watch the code run. We also wrote a test program that runs a large number of tests, writing any failures or returning the null list for success:

(define (test n) (list-of (list vec result) (x range 1 n) (y range 1 n) (vec is (list->vector (append (make-list x 0) (make-list y 1)))) (result is (f vec)) (or (not (= (vector-ref vec result) 1)) (not (= (vector-ref vec (- result 1)) 0)))))

> (set! verbose? #f) > (test 100) ()

My own experience is instructive. I ran f by hand for several test cases, and thought all was well. Then I wrote the test program, and threw an exception that caused the test to halt midway through. Not seeing the problem, I added the debugging code, and saw that in the problem case lo was greater than hi, which should be impossible. Once I knew what was wrong, the problem was easy to fix, but I would never have found it without the debugging output. Finally, I decided to leave both the test script and the debugging code in the final version of the program, so I could show it to you, and to satisfy a gnawing sense that there might still be some remaining problem.

You can run the program at http://programmingpraxis.codepad.org/gq1oqTqk.

Posted by programmingpraxis

Filed in Exercises

14 Comments »

14 Responses to “An Array Of Two Symbols”

Programming Praxis – An Array Of Two Symbols | Bonsai Code said
March 12, 2013 at 10:23 AM
[…] today’s exercise, our goal is to write a function that, given a list consisting of m of one symbol followed […]

Remco Niemeijer said

March 12, 2013 at 10:23 AM

My Haskell solution (see http://bonsaicode.wordpress.com/2013/03/12/programming-praxis-an-array-of-two-symbols/ for a version with comments):

import qualified Data.Vector as V
import Test.QuickCheck

search :: V.Vector Char -> Int
search xs = f 0 (V.length xs - 1) where
    f start end = case span ((xs V.! 0 ==) . (xs V.!)) . takeWhile (<= end) .
                       map (start +) $ 0 : iterate (*2) 1
                  of   ([_],[n]) -> n
                       (ms ,[] ) -> f (last ms) end
                       (ms ,n:_) -> f (last ms) n

On a side note, there’s a typo in the description. The complexity for the binary search should be O(log n), not O(log m).

cosmin said
March 12, 2013 at 12:30 PM
A python solution that computes the binary representation of m, and then returns m+1. The algorithm determines sequentially the most significant bit, then the second and so on, constructing 2^k1, 2^k1 + 2^k2, … , 2^k1 + 2^k2 + … + 2^kt = m
```
def split(a):
	pow2, n = 1, len(a)
	while pow2 < n and a[0] == a[pow2]:
		pow2 <<= 1
	pow2 >>= 1
	index = pow2
	while pow2 > 0:
		if index + pow2 < n and a[0] == a[index + pow2]:
			index += pow2
		pow2 >>= 1
	return index + 1

print split([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1])
```
Joachim said
March 12, 2013 at 1:54 PM
The typo in the text makes it quite confusing.

> You are to find the index of the first copy of the second symbol, m, in time O(log m).

> But that takes time O(log m), which violates the conditions of the exercise.
programmingpraxis said
March 12, 2013 at 1:56 PM
Fixed. Sorry about that.
howlettk said
March 12, 2013 at 3:00 PM
i have done it with php:
howlettk said
March 12, 2013 at 3:00 PM
i have dione it with php:
An Array Of Two Symbols(Java) | Exploring Java world said
March 12, 2013 at 3:23 PM
[…] Question is from here […]
javabloggi said
March 12, 2013 at 3:24 PM
My java solution here.
howlettk said
March 12, 2013 at 3:46 PM
i pasted the code so why didnt it show up?
anyway, here it is:
pfernie said
March 12, 2013 at 6:27 PM
Seems like there is a bug in the solution, in the call into the first loop:

(loop hi (max (+ hi hi) (- len 1)))

This should be a call to (min), otherwise on the first iteration it checks (0 1), but the next jumps directly to the case (1 len – 1), as can be seen from the sample output:
> (f '#(0 0 1 1 1 1)) first loop: 0 1 first loop: 1 5 // We blew right past 2... second loop: 1 3 5 second loop: 1 2 3 second loop: 1 1 2 second loop: 2 2 2 2

We should expect:
first loop: 0 1 first loop: 1 2 second loop: 1 1 2 second loop: 2 2 2 2
Pascal Bourguignon said
March 12, 2013 at 10:13 PM
This is simple: find a lower bound for min in O(log2(m)), then apply a dichotomy(min,max) which will be in O(log2(m)) too, for a total of O(log2(m)).

(defun find-in-log-m (v) (let* ((max (1- (length v))) (symbol-m (aref v max))) (labels ((bound-in-log-m (i) (if (eql symbol-m (aref v (- max i))) (bound-in-log-m (* 2 i)) i))) (let ((min (bound-in-log-m 1))) (1+ (nth-value 1 (com.informatimago.common-lisp.cesarum.utility:dichotomy (lambda (index) (cond ((eql symbol-m (aref v index)) -1) ((eql symbol-m (aref v (1+ index))) 0) (t +1))) min max)))))))
(find-in-log-m #(a a a a a a a a a a a a b b b b b b b b b b b b b b b b)) --> 17
programmingpraxis said
March 17, 2013 at 9:42 PM
pfernie: Fixed. That’s embarrassing.
JavaScript, CSS: interview questions | EugeneBichel's Blog said
October 1, 2015 at 1:00 PM
[…] Second symbol starting index Array of two symbol […]

Programming Praxis