Binary Search With Duplicates
November 7, 2017
Most implementations of binary search assume that the target array has no duplicates. But sometimes there are duplicates, and in that case you want to find the first occurrence of an item, not just any one of several occurrences. For instance, in the array [1,2,2,3,4,4,4,4,6,6,6,6,6,6,7] the first occurrence of 4 is at element 4 (counting from 0), the first occurrence of 6 is at element 8, and 5 does not appear in the array.
Your task is to write a binary search that finds the first occurrance of a set of duplicate items in a sorted array. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
I would just call the standard binary-search function and then linearly search backwards for the first non-matching value. Granted, if all the values are the same this is O(n), but that isn’t very likely.
@JohnCowan: In the early days of personal computing, I used a shareware database manager that used a standard binary-search function and then scanned backwards, as you suggest. I asked for the first M in a binary F/M field (female/male). You can guess what happened. I sent an email to the developer telling him how to find the first M in logarithmic rather than linear time. He thanked me, and said he had never heard of that before.
It’s probably better to skip the equality test in the loop and do a “deferred” check at the end:
bsearch returns the index of the first item greater or equal to the given value (or one past the end of the array if there is no such element). find then checks the returned position and returns it if indeed the value is at that position.
@JohnCowan, your response is consistent with your response to an earlier binary search exercise.
https://programmingpraxis.com/2016/04/29/binary-search-2/#comment-59653
Here’s a related blog post:
https://research.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
I probably should have written:
though I think it works as it stands.
Actually, it doesn’t, integer division is the way to go.
In fact, in Python 2, int divided by int gives int and the code is OK, in Python 3, it gives a float, but indexing an array with a float gives a runtime error.
Here’s an O(log n) solution in C99.
Output:
In Python this can be easily solved with bisect_left from the bisect module. See the function named “index” in the documentation.
[…] looked at binary search in the previous exercise. Today we look at ternary search. Instead of one mid at the middle of the array, ternary search has […]
[…] looked at variants of binary search in two recent exercises. Today we look at a third […]