Binary Search With Duplicates

November 7, 2017

We start with a “normal” binary search, that assumes no duplicates, so we can compare later:

(define (bsearch1 lt? x xs)
  (let loop ((lo 0) (hi (- (vector-length xs) 1)))
    (let ((mid (+ lo (quotient (- hi lo) 2))))
      (cond ((< hi lo) #f)
            ((lt? x (vector-ref xs mid)) (loop lo (- mid 1)))
            ((lt? (vector-ref xs mid) x) (loop (+ mid 1) hi))
            (else mid)))))

Here are some examples:

> (bsearch1 < 0 '#(1 2 3 5 6 7))
#f
> (bsearch1 < 1 '#(1 2 3 5 6 7))
0
> (bsearch1 < 2 '#(1 2 3 5 6 7))
1
> (bsearch1 < 3 '#(1 2 3 5 6 7))
2
> (bsearch1 < 4 '#(1 2 3 5 6 7))
#f
> (bsearch1 < 5 '#(1 2 3 5 6 7))
3
> (bsearch1 < 6 '#(1 2 3 5 6 7))
4
> (bsearch1 < 7 '#(1 2 3 5 6 7))
5
> (bsearch1 < 8 '#(1 2 3 5 6 7))
#f

For the search with duplicates, the trick is that the “equals” branch of the multi-way comparison doesn’t end the search, it just resets the top end of the search space; the search ends when the lo and hi pointers cross, and is successful only if some element reset the result value:

(define (bsearch2 lt? x xs)
  (let loop ((lo 0) (hi (- (vector-length xs) 1)) (result #f))
    (let ((mid (+ lo (quotient (- hi lo) 2))))
      (cond ((< hi lo) result)
            ((lt? x (vector-ref xs mid)) (loop lo (- mid 1) result))
            ((lt? (vector-ref xs mid) x) (loop (+ mid 1) hi result))
            (else (loop lo (- mid 1) mid))))))

Here are some examples:

> (define xs '#(1 2 2 3 4 4 4 4 6 6 6 6 6 6 7))
> (bsearch2 < 0 xs)
#f
> (bsearch2 < 1 xs)
0
> (bsearch2 < 2 xs)
1
> (bsearch2 < 3 xs)
3
> (bsearch2 < 4 xs)
4
> (bsearch2 < 5 xs)
#f
> (bsearch2 < 6 xs)
8
> (bsearch2 < 7 xs)
14
> (bsearch2 < 8 xs)
#f

You can run the program at https://ideone.com/hyxZQi.

Posted by programmingpraxis

Filed in Exercises

12 Comments »

12 Responses to “Binary Search With Duplicates”

Pascal Bourguignon said

November 7, 2017 at 11:02 AM


;; This problem is no different than the usual binary search (dichotomy),
;; but with a modified comparison  function.  Now, if the current element
;; is equal to the target value,  but the previous element is also equal,
;; then the current element must be considered greater.

(defun binary-search-first (vector value compare &key
                                                   (start 0) (end (length vector))
                                                   (key (function identity)))
  "
COMPARE: A comparing two elements A and B, and returning an order
         (signed integer), such as:
             A<B <=> result<0
             A=B <=> result=0
             A>B <=> result>0
START:   The minimum index.
END:     The maximum index+1.
RETURN:  (values found index order)
POST:	 (<= start index (1- end))
         +-------------------+----------+-------+----------+
         | Case              |  found   | index |  order   |
         +-------------------+----------+-------+----------+
         | x < a[i]          |   FALSE  | start |  less    |
         | a[i] < x < a[i+1] |   FALSE  |   i   |  greater |
         | x = a[i]          |   TRUE   |   i   |  equal   |
         | a[max] < x        |   FALSE  | end-1 |  greater |
         +-------------------+----------+-------+----------+
"
  (com.informatimago.common-lisp.cesarum.utility:dichotomy
   (lambda (current)
     (if (= current start)
         (funcall compare value (funcall key (aref vector current)))

         (let ((order (funcall compare value (funcall key (aref vector current)))))
           (if (and (zerop order)
                    (zerop (funcall compare value (funcall key (aref vector (1- current))))))
               -1
               order))))
   start end))

(binary-search-first #(1 2 2 3 4 4 4 4 6 6 6 6 6 6 7) 4 (lambda (a b)
                                                          (cond ((< a b) -1)
                                                                ((> a b) +1)
                                                                (t        0))))
;; --> t
;;     4
;;     0

John Cowan said
November 7, 2017 at 1:20 PM
I would just call the standard binary-search function and then linearly search backwards for the first non-matching value. Granted, if all the values are the same this is O(n), but that isn’t very likely.
programmingpraxis said
November 7, 2017 at 1:40 PM
@JohnCowan: In the early days of personal computing, I used a shareware database manager that used a standard binary-search function and then scanned backwards, as you suggest. I asked for the first M in a binary F/M field (female/male). You can guess what happened. I sent an email to the developer telling him how to find the first M in logarithmic rather than linear time. He thanked me, and said he had never heard of that before.

matthew said

November 7, 2017 at 6:33 PM

It’s probably better to skip the equality test in the loop and do a “deferred” check at the end:

def bsearch(a,n):
    i,j = 0,len(a)
    # k < i => a[k] < n
    # k >= j => a[k] >= n
    while i != j:
        k = i + (j-i)/2
        if a[k] < n: i = k+1
        else: j = k
    return i

def find(a,n):
    k = bsearch(a,n)
    if k < len(a) and a[k] == n: return k

a = [1,2,2,4,4,4,4]
print [(i, bsearch(a,i), find(a,i)) for i in range(6)]
# [(0, 0, None), (1, 0, 0), (2, 1, 1), (3, 3, None), (4, 3, 3), (5, 7, None)]

bsearch returns the index of the first item greater or equal to the given value (or one past the end of the array if there is no such element). find then checks the returned position and returns it if indeed the value is at that position.

Daniel said
November 7, 2017 at 7:29 PM
@JohnCowan, your response is consistent with your response to an earlier binary search exercise.

Binary Search

Here’s a related blog post:
https://research.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
matthew said
November 7, 2017 at 8:12 PM
I probably should have written:
```
        k = i + (j-i)//2
```
though I think it works as it stands.
matthew said
November 7, 2017 at 8:15 PM
Actually, it doesn’t, integer division is the way to go.
matthew said
November 7, 2017 at 8:26 PM
In fact, in Python 2, int divided by int gives int and the code is OK, in Python 3, it gives a float, but indexing an array with a float gives a runtime error.

Daniel said

November 7, 2017 at 8:34 PM

Here’s an O(log n) solution in C99.

#include <stdio.h>

int search_dups(int* array, int n, int element) {
  int lo = 0;
  int hi = n;
  int output = -1;
  while (hi > lo) {
    int mid = lo + (hi-lo) / 2;
    int val = array[mid];
    if (val < element) {
      lo = mid + 1;
    } else if (val > element) {
      hi = mid;
    } else {
      output = mid;
      hi = mid;
    }
  }
  return output;
}

int main(void) {
  int array[] = {1,2,2,3,4,4,4,4,6,6,6,6,6,6,7};
  int n = sizeof(array) / sizeof(int);
  printf("element: index\n");
  for (int i = 0; i < 10; ++i) {
    int idx = search_dups(array, n, i);
    printf("%d: %2d\n", i, idx);
  }
}

Output:

element: index
0: -1
1:  0
2:  1
3:  3
4:  4
5: -1
6:  8
7: 14
8: -1
9: -1

Paul said
November 8, 2017 at 9:01 AM
In Python this can be easily solved with bisect_left from the bisect module. See the function named “index” in the documentation.
Ternary Search | Programming Praxis said
November 10, 2017 at 10:01 AM
[…] looked at binary search in the previous exercise. Today we look at ternary search. Instead of one mid at the middle of the array, ternary search has […]
Floor And Ceiling In An Array | Programming Praxis said
November 21, 2017 at 10:01 AM
[…] looked at variants of binary search in two recent exercises. Today we look at a third […]

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Programming Praxis