Binary Gap
January 12, 2018
A binary gap is a sequence of 0-bits between 1-bits in the binary representation of a number. For instance, 2010 = 101002 has a binary gap of length 1 between its first and third bits; the two 0-bits that end the number are not part of a binary gap because there is no trailing 1-bit. Thus, the length of the maximal binary gap in 20 is 1. The length of the maximal binary gap in 52910 = 10000100012 is 4.
Your task is to write a program that finds the length of the maximal binary gap in a number. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
In Python, using a regular expression with a lookahead for the trailing 1.
That’s more like it, a nice bit twiddling challenge. It seems to be easier to find blocks of ones than blocks of zeroes and once we do, there is the popcount instruction on more modern x86 processors that tells us how long they are (accessible through the __builtin_popcount() intrinsic on gcc). So here’s some C++ code that uses bit-level operations to sequentially find the rightmost sequential block of ones. To solve the original problem we flip the bits at the beginning and use a simple test at the end to filter out blocks that start or end at the start or end of the bit sequence.
Here’s that code in Matthew Godbolt’s excellent online compiler: https://godbolt.org/g/KiXcsm
programmingpraxis, please consider what happens if the function “gap” is called with a value of zero.
Here’s a solution in C.
It removes the trailing 0 bits, then iteratively removes the trailing 1 bits and counts/removes the trailing 0 bits.
gcc’s built-in count-trailing-zeros function, __builtin_ctz is used for counting trailing 0 bits.
Output:
Here’s a version in x86 assembly language embedded in a C++ wrapper. It makes use of the bsr (bit scan reverse) and bsf (bit scan forward) instructions to ignore any leading and trailing sequences of 0s. (They find the bit position of the most significant and least significant 1 bits, respectively.)
Amusingly, an earlier version had a more complex (mis)use of the asm directive, which caused both gcc and clang++ to panic on Mac OS.
@Globules: nice idea and inline assembler is always fun. You could also do much the same thing with gcc’s __builtin_clz and __builtin_ctz functions (‘count leading zeros’ and ‘count trailing zeros’) which should also work with clang & possibly other compilers. Adapting my code above (and 0-terminating the binary string properly – I liked your use of bitset too, BTW, didn’t know you could do that to print binary strings):
Using the builtins also mean that our code works on non-x86 platforms, eg. for ARM64, Mr. Godbolt’s online compiler (https://godbolt.org/g/D12gZo) gives:
Some sample output:
Just noticed @Daniel has a nice ctz solution up above as well.
@Globules and others: The problem specified that the input is a number, not an arbitrary bit string, so there will never be a leading zero. The problem doesn’t state, though it should, that the number is always positive, although I think that is implied in the solution.
@Matthew I thought about using built-ins, but decided this would be a nice excuse to learn a little about x86 assembly language. Still, I could have used only the bsr and bsf instructions, assigning their results to variables and doing the rest in C++, but I figured “in for a penny, in for a pound”. (Or, perhaps “penny wise, pound foolish” is more appropriate… :-)
@programmingpraxis Upon initial reading I interpreted “binary representation of a number” as the machine’s binary representation of a number. Once you’ve headed down that path you have to care about leading zeros/bits. Of course, as you point out, a “binary representation of a number” could simply be a list of its digits in binary. In the spirit of the latter interpretation, the following is a Haskell version. (For readability I limit the test values to 16 bits, but the code works for arbitrary size numbers.)
@Globules: fair enough. Assembler is fun. I was wondering if you needed that comparison on line 57/58 – we already know that n is non-zero at this point.
@Praxis: Not quite sure what your point is, presence of leading zeros is surely a property of the representation, not the number itself. Anyway, isn’t the underlying problem really about bit sequences so including (representations of) negative numbers seems to make sense (and since we know that the universe is two’s complement, we might as well use that).
@Matthew I think the comparison is needed, at least the way the code is currently structured. Given an argument having only a single sequence of 1 bits (e.g. …0110…) then after the sar instruction the rax register will be all 1s and the not will set it to all 0s. But, according to the docs I was reading the not instruction doesn’t set any flags, so we have to do an explicit comparison to know whether we have all 0s.
@Globules: You are quite right, my code above is wrong. It originally used a ‘while’ rather than a ‘do’ which would have been correct. I changed it because it made the code slightly shorter. Should have realized the compiler probably would have removed the extra comparison itself if it really had been redundant. Corrected version here: https://godbolt.org/g/FCBLQi. Looks like it’s doing the ‘not’ as an xor with -1, which presumably does set the flags.
@Matthew The xor with -1 is a nice trick; I’ve updated my code at home with it.