Binary Search
April 29, 2016
We did this exercise a long time ago, but it’s worth doing again, because binary search is fundamental and because it is such a good teaching tool. Here’s my version, which assumes that the input is sorted in non-decreasing order:
(define (bsearch lt? x xs) (let loop ((lo 0) (hi (- (vector-length xs) 1))) (let ((mid (quotient (+ lo hi) 2))) (cond ((< hi lo) #f) ((lt? x (vector-ref xs mid)) (loop lo (- mid 1))) ((lt? (vector-ref xs mid) x) (loop (+ mid 1) hi)) (else mid)))))
For testing, I create vectors containing even numbers, then check that the results are correct when searching for both evens and odds:
(define (test-bsearch n) (do ((i 0 (+ i 1))) ((= n i)) (let ((xs (list->vector (range 0 n 2)))) (do ((j -1 (+ j 1))) ((< n j)) (if (and (even? j) (< j n)) (assert (bsearch < j xs) (/ j 2)) (assert (bsearch < j xs) #f))))))
No news is good news:
> (test-bsearch 25)
>
We used range and the assert macro from the Standard Prelude. You can run the program at http://ideone.com/qzE6ZT.
Hell no. Binary search is extraordinarily difficult to get right. Even Bentley’s “proved correct” 2006 algorithm turns out to have a bug involving numeric overflow (which is at least only a performance bug in Lispy languages, where overflow produces bignums, floats, or an error). This is one of those cases like Posix regular expressions, where you absolutely need to use a high-quality library, not the one which comes with your OS (which surely has bugs), still less your own casually written code.
This is where I always write those invariants and variants out in some detail and still get it wrong at first. It happened again: on the first run I had an infinite loop and had to think; sure enough, one of my comments was false, so I fixed the comment and the code.
The proceduce looks for the least index where the element satisfies a predicate. The test predicate asserts that a string is at least as many characters long as a given number, and there are equally long strings and length gaps in the data. The final check is when no element is big enough, so the value is the length of the vector, which is not the index of any element.
Output (least index, sought least length, the string at the found index):
I translated :)
Here’s a solution in Java.
Two things I’d probably do differently if implementing this in production, as opposed to implementing for an exercise:
1) loops instead of recursion
2) a more generic implementation that works for other types of arrays in addition to integer arrays.
After implementing, I took a look at how binary search is implemented in Java’s standard library. The version I implemented returns -1 if it doesn’t find the element. The Java built-in also returns a negative number when the element is not found, but the returned number seemingly can be transformed to an insertion point, “the point at which the key would be inserted into the array”. I didn’t consider returning that information, but suppose it could be useful in some cases.
Also note I avoided the overflow error that John Cowan referenced, having already been familiar with this post:
http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
Note: I’ve quoted from that page when I or someone else has detected bugs in my code:
“We programmers need all the help we can get, and we should never assume otherwise. Careful design is great. Testing is great. Formal methods are great. Code reviews are great. Static analysis is great. But none of these things alone are sufficient to eliminate bugs: They will always be with us. A bug can exist for half a century despite our best efforts to exterminate it. We must program carefully, defensively, and remain ever vigilant.