An Array Of Two Symbols
March 12, 2013
The solution is to start from the left end instead of the middle, doubling the size of the search sub-array at each step: first from 0 to 1, then from 1 to 2, then from 2 to 4, then from 4 to 8, and so on. That will take O(log m) doublings. Once i doublings have occurred and the first copy of the second symbol is bracketed between the 2ith array element and the 2i+1th, the sub-array will be no larger than m, so binary search on the sub-array will find the desired item in O(log m) time, giving a combined O(log m) for the entire solution. Here’s the code, where we use 0 and 1 as the two symbols; beware that it is harder than it looks to get this code right:
(define (f vec)
(let ((len (vector-length vec)))
(let loop ((lo 0) (hi 1))
(when verbose? (display "first loop: ")
(display lo) (display " ") (display hi) (newline))
(if (zero? (vector-ref vec hi))
(loop hi (min (+ hi hi) (- len 1)))
(let loop ((lo lo) (hi hi))
(let ((mid (quotient (+ lo hi) 2)))
(when verbose? (display "second loop: ")
(display lo) (display " ") (display mid)
(display " ") (display hi) (newline))
(cond ((= lo hi) hi)
((zero? (vector-ref vec mid))
(loop (+ mid 1) hi))
(else (loop lo mid)))))))))
We assume that the form of the input vector is correct, that it has at least two items, including at least one 0 and one 1, and that all the 0 items appear before all the 1 items. Here’s an example:
> (set! verbose? #t)
> (f '#(0 0 1 1 1 1))
first loop: 0 1
first loop: 1 5
second loop: 1 3 5
second loop: 1 2 3
second loop: 1 1 2
second loop: 2 2 2
2
The code here is tricky, with several places where it is easy to be off-by-one or to branch the wrong way on equality (yes, I stumbled into one of them while writing this code). Because of that, the code includes debugging output that can be turned on by setting the global variable verbose?
to #t
; it is instructive to turn on the verbose output and watch the code run. We also wrote a test program that runs a large number of tests, writing any failures or returning the null list for success:
(define (test n)
(list-of (list vec result)
(x range 1 n)
(y range 1 n)
(vec is (list->vector (append (make-list x 0) (make-list y 1))))
(result is (f vec))
(or (not (= (vector-ref vec result) 1))
(not (= (vector-ref vec (- result 1)) 0)))))
> (set! verbose? #f)
> (test 100)
()
My own experience is instructive. I ran f
by hand for several test cases, and thought all was well. Then I wrote the test
program, and threw an exception that caused the test to halt midway through. Not seeing the problem, I added the debugging code, and saw that in the problem case lo was greater than hi, which should be impossible. Once I knew what was wrong, the problem was easy to fix, but I would never have found it without the debugging output. Finally, I decided to leave both the test script and the debugging code in the final version of the program, so I could show it to you, and to satisfy a gnawing sense that there might still be some remaining problem.
You can run the program at http://programmingpraxis.codepad.org/gq1oqTqk.
[…] today’s exercise, our goal is to write a function that, given a list consisting of m of one symbol followed […]
My Haskell solution (see http://bonsaicode.wordpress.com/2013/03/12/programming-praxis-an-array-of-two-symbols/ for a version with comments):
On a side note, there’s a typo in the description. The complexity for the binary search should be O(log n), not O(log m).
A python solution that computes the binary representation of m, and then returns m+1. The algorithm determines sequentially the most significant bit, then the second and so on, constructing 2^k1, 2^k1 + 2^k2, … , 2^k1 + 2^k2 + … + 2^kt = m
The typo in the text makes it quite confusing.
> You are to find the index of the first copy of the second symbol, m, in time O(log m).
> But that takes time O(log m), which violates the conditions of the exercise.
Fixed. Sorry about that.
i have done it with php:
i have dione it with php:
[…] Question is from here […]
My java solution here.
i pasted the code so why didnt it show up?
anyway, here it is:
Seems like there is a bug in the solution, in the call into the first loop:
(loop hi (max (+ hi hi) (- len 1)))
This should be a call to (min), otherwise on the first iteration it checks (0 1), but the next jumps directly to the case (1 len – 1), as can be seen from the sample output:
> (f '#(0 0 1 1 1 1))
first loop: 0 1
first loop: 1 5 // We blew right past 2...
second loop: 1 3 5
second loop: 1 2 3
second loop: 1 1 2
second loop: 2 2 2
2
We should expect:
first loop: 0 1
first loop: 1 2
second loop: 1 1 2
second loop: 2 2 2
2
This is simple: find a lower bound for min in O(log2(m)), then apply a dichotomy(min,max) which will be in O(log2(m)) too, for a total of O(log2(m)).
(defun find-in-log-m (v)
(let* ((max (1- (length v)))
(symbol-m (aref v max)))
(labels ((bound-in-log-m (i)
(if (eql symbol-m (aref v (- max i)))
(bound-in-log-m (* 2 i))
i)))
(let ((min (bound-in-log-m 1)))
(1+ (nth-value 1 (com.informatimago.common-lisp.cesarum.utility:dichotomy
(lambda (index)
(cond
((eql symbol-m (aref v index)) -1)
((eql symbol-m (aref v (1+ index))) 0)
(t +1)))
min max)))))))
(find-in-log-m #(a a a a a a a a a a a a b b b b b b b b b b b b b b b b))
--> 17
pfernie: Fixed. That’s embarrassing.
[…] Second symbol starting index Array of two symbol […]