Binary Gap
January 12, 2018
Here is our solution:
(define (gap n)
(while (even? n) (set! n (/ n 2)))
(let loop ((n n) (curr 0) (maxi 0))
(cond ((zero? n) maxi)
((even? n) (loop (/ n 2) (+ curr 1) maxi))
(else (loop (quotient n 2) 0 (max curr maxi))))))
The while discards any 0-bits at the end of the number (there are never any 0-bits at the beginning of a number), then the loop counts 0-bits and resets at every 1-bit. Here’s an example:
> (map gap '(9 15 20 529)) (2 0 1 4)
You can run the program at https://ideone.com/DhZwaR.
In Python, using a regular expression with a lookahead for the trailing 1.
import re c = re.compile("10+(?=1)") def bingap(n): b = bin(n)[2:] match = c.findall(b) return max(map(len, match)) - 1 if match else 0That’s more like it, a nice bit twiddling challenge. It seems to be easier to find blocks of ones than blocks of zeroes and once we do, there is the popcount instruction on more modern x86 processors that tells us how long they are (accessible through the __builtin_popcount() intrinsic on gcc). So here’s some C++ code that uses bit-level operations to sequentially find the rightmost sequential block of ones. To solve the original problem we flip the bits at the beginning and use a simple test at the end to filter out blocks that start or end at the start or end of the bit sequence.
#include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <algorithm> char *binary(uint32_t n, char *s) { for (int i = 0; i < 32; i++) { s[31-i] = '0' + ((n>>i)&1); } return s; } int main(int argc, char* argv[]) { if (argc < 2) abort(); uint32_t n = strtoul(argv[1],0,0); char buff[33]; int max = 0; n = ~n; // invert as we want runs of zeroes while (n != 0) { fprintf(stderr,"n=%s\n", binary(n,buff)); uint32_t t = n|(n-1); // Extend ones uint32_t m = t+1; // Mask uint32_t b = n&~m; // Block of ones n = n-b; fprintf(stderr,"t=%s\n", binary(t,buff)); fprintf(stderr,"m=%s\n", binary(m,buff)); fprintf(stderr,"b=%s\n", binary(b,buff)); fprintf(stderr,"\n"); // Ignore blocks at start or end. Count bits with popcount if (!(b&(1|(1L<<31)))) max = std::max(max,__builtin_popcountl(b)); } fprintf(stdout,"%d\n", max); }Here’s that code in Matthew Godbolt’s excellent online compiler: https://godbolt.org/g/KiXcsm
programmingpraxis, please consider what happens if the function “gap” is called with a value of zero.
Here’s a solution in C.
It removes the trailing 0 bits, then iteratively removes the trailing 1 bits and counts/removes the trailing 0 bits.
gcc’s built-in count-trailing-zeros function, __builtin_ctz is used for counting trailing 0 bits.
#include <stdio.h> int calc_binary_gap(unsigned long x) { if (!x) return 0; // Remove trailing 0-bits x >>= __builtin_ctzl(x); int binary_gap = 0; while (1) { // Remove trailing 1-bits x >>= __builtin_ctzl(~x); if (!x) return binary_gap; // Count trailing 0-bits int current_gap = __builtin_ctzl(x); if (current_gap > binary_gap) binary_gap = current_gap; // Remove trailing 0-bits x >>= current_gap; } } int main(void) { int binary_gap = calc_binary_gap(529); printf("%d\n", binary_gap); }Output:
def binary_gap(n, gap=0, maxx=0, closed=0): while n: if n & 1: if closed: maxx = max(maxx, gap) gap = 0 else: closed, gap = 1, 0 else: gap += 1 n >>= 1 return maxx print(binary_gap(529)) # => 4Here’s a version in x86 assembly language embedded in a C++ wrapper. It makes use of the bsr (bit scan reverse) and bsf (bit scan forward) instructions to ignore any leading and trailing sequences of 0s. (They find the bit position of the most significant and least significant 1 bits, respectively.)
Amusingly, an earlier version had a more complex (mis)use of the asm directive, which caused both gcc and clang++ to panic on Mac OS.
/* * Compile: * * g++-7 -Wall -Wextra -pedantic -O3 -masm=att -o bingap_asm bingap_asm.cpp * clang++-3.7 -Wall -Wextra -pedantic -O3 -masm=att -o bingap_asm bingap_asm.cpp */ #include <bitset> #include <iostream> /* * Return the largest binary gap in the argument. * * A binary gap is a contiguous sequence of one or more 0 bits bracketed between * two 1 bits. * * We implement the following algorithm: * * If the argument is 0 we return immediately with a result of 0. * * Otherwise, shift the value left so that its leftmost 1 bit becomes the sign * bit, then shift it right (propagating the sign bit) so that the rightmost 1 * bit is in position 0. This eliminates any leading or trailing sequences of * 0s. * * Negate the result, then count the number of times we can do: * * bits &= bits << 1 * * until bits becomes 0. Return this count, which is the length of the longest * binary gap. * * Demonstrating with an 8-bit value (instead of 64) we have: * * 00110010 input value * 11001000 shifted left * 11111001 shifted right * 00000110 negated * 00000100 after first "shft &= shft << 1" (i = 1) * 00000000 after second (i = 2) */ static uint32_t bingap(uint64_t x) { uint32_t i; __asm__ ("xor %[i], %[i] \n\t" "cmpq $0, %[x] \n\t" "jz done \n\t" "mov %[x], %%rax \n\t" "bsr %%rax, %%rcx \n\t" "neg %%rcx \n\t" "add $63, %%rcx \n\t" "shl %%cl, %%rax \n\t" "bsf %%rax, %%rcx \n\t" "sar %%cl, %%rax \n\t" "not %%rax \n\t" "cmp $0, %%rax \n\t" "jz done \n\t" "loop: \n\t" "inc %[i] \n\t" "mov %%rax, %%rdx \n\t" "shl $1, %%rdx \n\t" "and %%rdx, %%rax \n\t" "jnz loop \n\t" "done: \n\t" : [i] "=r" (i) : [x] "rm" (x) : "%rax", "%rcx", "%cl", "%rdx"); return i; } // Parse a binary string. static uint64_t b2(const std::string& s) { return std::stoull(s, nullptr, 2); } // Convert to a binary string. static std::string b2(uint64_t x) { std::bitset<64>b(x); return b.to_string(); } int main(int ac, char* av[]) { for (int i = 1; i < ac; i++) { uint64_t x(b2(av[i])); uint32_t r(bingap(x)); std::cout << b2(x) << " => " << r << std::endl; } exit(0); }@Globules: nice idea and inline assembler is always fun. You could also do much the same thing with gcc’s __builtin_clz and __builtin_ctz functions (‘count leading zeros’ and ‘count trailing zeros’) which should also work with clang & possibly other compilers. Adapting my code above (and 0-terminating the binary string properly – I liked your use of bitset too, BTW, didn’t know you could do that to print binary strings):
#include <stdint.h> #include <stdlib.h> #include <stdio.h> char *binary(uint32_t n, char *s) { for (int i = 0; i < 32; i++) { s[31-i] = '0' + ((n>>i)&1); } s[32] = 0; return s; } int bgap(uint32_t n) { char buff[33]; if (n == 0) return 0; fprintf(stderr,"%s\n", binary(n,buff)); n <<= __builtin_clz(n); fprintf(stderr,"%s\n", binary(n,buff)); n = ((int32_t)n) >> __builtin_ctz(n); fprintf(stderr,"%s\n", binary(n,buff)); n = ~n; int count = 0; do { fprintf(stderr,"%s\n", binary(n,buff)); n &= (n<<1); count++; } while (n != 0); return count; } int main(int argc, char* argv[]) { if (argc < 2) abort(); uint32_t n = strtoul(argv[1],0,0); int max = bgap(n); fprintf(stdout,"%d\n", max); }Using the builtins also mean that our code works on non-x86 platforms, eg. for ARM64, Mr. Godbolt’s online compiler (https://godbolt.org/g/D12gZo) gives:
bgap(unsigned int): mov w2, 0 cbz w0, .L1 clz w1, w0 mov w2, 0 lsl w1, w0, w1 rbit w0, w1 clz w0, w0 asr w1, w1, w0 mvn w1, w1 .L3: ands w1, w1, w1, lsl 1 add w2, w2, 1 bne .L3 .L1: mov w0, w2 retSome sample output:
Just noticed @Daniel has a nice ctz solution up above as well.
@Globules and others: The problem specified that the input is a number, not an arbitrary bit string, so there will never be a leading zero. The problem doesn’t state, though it should, that the number is always positive, although I think that is implied in the solution.
@Matthew I thought about using built-ins, but decided this would be a nice excuse to learn a little about x86 assembly language. Still, I could have used only the bsr and bsf instructions, assigning their results to variables and doing the rest in C++, but I figured “in for a penny, in for a pound”. (Or, perhaps “penny wise, pound foolish” is more appropriate… :-)
@programmingpraxis Upon initial reading I interpreted “binary representation of a number” as the machine’s binary representation of a number. Once you’ve headed down that path you have to care about leading zeros/bits. Of course, as you point out, a “binary representation of a number” could simply be a list of its digits in binary. In the spirit of the latter interpretation, the following is a Haskell version. (For readability I limit the test values to 16 bits, but the code works for arbitrary size numbers.)
import Data.Bool (bool) import Data.List (group, unfoldr) import Data.Tuple (swap) import System.Environment import Text.Printf -- Return the binary gap of a number. -- -- We convert the number to its base-2 representation, drop any leading (least -- significant 0s), group together sequences of the same digit, keep only those -- groups consisting of 0s, map each group to its length, prepend 0 to the -- resulting list (in case it was empty), then return the maximum value. bingap :: Integral a => a -> Int bingap = maximum . (0:) . map length . filter (\(d:_) -> d == 0) . group . dropWhile (== 0) . toBase 2 -- Convert a number to its base-b representation. The order of the digits will -- be from least to most significant. Zero is the empty list. toBase :: Integral a => a -> a -> [a] toBase b = unfoldr (\n -> bool Nothing (Just $ swap $ quotRem n b) (n /= 0)) -- -- Utility Functions -- -- Interpret a list as the base-b representation of a number. The order of the -- digits is from least to most significant. The empty list is zero. fromBase :: Integral a => a -> [a] -> a fromBase b = foldr (\d n -> n * b + d) 0 -- Interpret a string as a base-2 number, converting '0' to 0 and any other -- character to 1. parseBase2 :: Integral a => String -> a parseBase2 = fromBase 2 . reverse . map (bool 1 0 . (== '0')) test :: Integer -> String test n = printf "%016b => %d" n (bingap n) main :: IO () main = getArgs >>= mapM_ (putStrLn . test . parseBase2)@Globules: fair enough. Assembler is fun. I was wondering if you needed that comparison on line 57/58 – we already know that n is non-zero at this point.
@Praxis: Not quite sure what your point is, presence of leading zeros is surely a property of the representation, not the number itself. Anyway, isn’t the underlying problem really about bit sequences so including (representations of) negative numbers seems to make sense (and since we know that the universe is two’s complement, we might as well use that).
@Matthew I think the comparison is needed, at least the way the code is currently structured. Given an argument having only a single sequence of 1 bits (e.g. …0110…) then after the sar instruction the rax register will be all 1s and the not will set it to all 0s. But, according to the docs I was reading the not instruction doesn’t set any flags, so we have to do an explicit comparison to know whether we have all 0s.
@Globules: You are quite right, my code above is wrong. It originally used a ‘while’ rather than a ‘do’ which would have been correct. I changed it because it made the code slightly shorter. Should have realized the compiler probably would have removed the extra comparison itself if it really had been redundant. Corrected version here: https://godbolt.org/g/FCBLQi. Looks like it’s doing the ‘not’ as an xor with -1, which presumably does set the flags.
@Matthew The xor with -1 is a nice trick; I’ve updated my code at home with it.