Binary Gap

January 12, 2018

A binary gap is a sequence of 0-bits between 1-bits in the binary representation of a number. For instance, 2010 = 101002 has a binary gap of length 1 between its first and third bits; the two 0-bits that end the number are not part of a binary gap because there is no trailing 1-bit. Thus, the length of the maximal binary gap in 20 is 1. The length of the maximal binary gap in 52910 = 10000100012 is 4.

Your task is to write a program that finds the length of the maximal binary gap in a number. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisement

Pages: 1 2

17 Responses to “Binary Gap”

  1. Paul said

    In Python, using a regular expression with a lookahead for the trailing 1.

    import re
    
    c = re.compile("10+(?=1)")
    
    def bingap(n):
        b = bin(n)[2:]
        match = c.findall(b)
        return max(map(len, match)) - 1 if match else 0
    
  2. matthew said

    That’s more like it, a nice bit twiddling challenge. It seems to be easier to find blocks of ones than blocks of zeroes and once we do, there is the popcount instruction on more modern x86 processors that tells us how long they are (accessible through the __builtin_popcount() intrinsic on gcc). So here’s some C++ code that uses bit-level operations to sequentially find the rightmost sequential block of ones. To solve the original problem we flip the bits at the beginning and use a simple test at the end to filter out blocks that start or end at the start or end of the bit sequence.

    #include <stdint.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <algorithm>
    
    char *binary(uint32_t n, char *s) {
      for (int i = 0; i < 32; i++) {
        s[31-i] = '0' + ((n>>i)&1);
      }
      return s;
    }
    
    int main(int argc, char* argv[]) {
      if (argc < 2) abort();
      uint32_t n = strtoul(argv[1],0,0);
      char buff[33];
      int max = 0;
      n = ~n; // invert as we want runs of zeroes
      while (n != 0) {
        fprintf(stderr,"n=%s\n", binary(n,buff));
        uint32_t t = n|(n-1);     // Extend ones
        uint32_t m = t+1;         // Mask
        uint32_t b = n&~m;        // Block of ones
        n = n-b;
        fprintf(stderr,"t=%s\n", binary(t,buff));
        fprintf(stderr,"m=%s\n", binary(m,buff));
        fprintf(stderr,"b=%s\n", binary(b,buff));
        fprintf(stderr,"\n");
        // Ignore blocks at start or end. Count bits with popcount
        if (!(b&(1|(1L<<31)))) max = std::max(max,__builtin_popcountl(b));
      }
      fprintf(stdout,"%d\n", max);
    }
    
    $ g++ -Wall bgap.cpp -o bgap
    $ ./bgap 529
    n=11111111111111111111110111101110
    t=11111111111111111111110111101111
    m=11111111111111111111110111110000
    b=00000000000000000000000000001110
    
    n=11111111111111111111110111100000
    t=11111111111111111111110111111111
    m=11111111111111111111111000000000
    b=00000000000000000000000111100000
    
    n=11111111111111111111110000000000
    t=11111111111111111111111111111111
    m=00000000000000000000000000000000
    b=11111111111111111111110000000000
    
    4
    
  3. matthew said

    Here’s that code in Matthew Godbolt’s excellent online compiler: https://godbolt.org/g/KiXcsm

  4. nobody said

    programmingpraxis, please consider what happens if the function “gap” is called with a value of zero.

  5. Daniel said

    Here’s a solution in C.

    It removes the trailing 0 bits, then iteratively removes the trailing 1 bits and counts/removes the trailing 0 bits.

    gcc’s built-in count-trailing-zeros function, __builtin_ctz is used for counting trailing 0 bits.

    #include <stdio.h>
    
    int calc_binary_gap(unsigned long x) {
      if (!x) return 0;
      // Remove trailing 0-bits
      x >>= __builtin_ctzl(x);      
      int binary_gap = 0;
      while (1) {
        // Remove trailing 1-bits
        x >>= __builtin_ctzl(~x);
        if (!x) return binary_gap;
        // Count trailing 0-bits
        int current_gap = __builtin_ctzl(x);
        if (current_gap > binary_gap) binary_gap = current_gap;
        // Remove trailing 0-bits
        x >>= current_gap;
      }
    }
    
    int main(void) {
      int binary_gap = calc_binary_gap(529);
      printf("%d\n", binary_gap);
    }
    

    Output:

    4
    
  6. def binary_gap(n, gap=0, maxx=0, closed=0):
    	while n:
    		if n & 1:
    			if closed:
    				maxx = max(maxx, gap)
    				gap = 0
    			else: closed, gap = 1, 0
    		else: gap += 1
    		n >>= 1
    	return maxx
    
    print(binary_gap(529))
        # => 4
    
  7. Globules said

    Here’s a version in x86 assembly language embedded in a C++ wrapper. It makes use of the bsr (bit scan reverse) and bsf (bit scan forward) instructions to ignore any leading and trailing sequences of 0s. (They find the bit position of the most significant and least significant 1 bits, respectively.)

    Amusingly, an earlier version had a more complex (mis)use of the asm directive, which caused both gcc and clang++ to panic on Mac OS.

    /*
     * Compile:
     *
     *   g++-7       -Wall -Wextra -pedantic -O3 -masm=att -o bingap_asm bingap_asm.cpp
     *   clang++-3.7 -Wall -Wextra -pedantic -O3 -masm=att -o bingap_asm bingap_asm.cpp
     */
    
    #include <bitset>
    #include <iostream>
    
    /*
     * Return the largest binary gap in the argument.
     *
     * A binary gap is a contiguous sequence of one or more 0 bits bracketed between
     * two 1 bits.
     * 
     * We implement the following algorithm:
     * 
     * If the argument is 0 we return immediately with a result of 0.
     *
     * Otherwise, shift the value left so that its leftmost 1 bit becomes the sign
     * bit, then shift it right (propagating the sign bit) so that the rightmost 1
     * bit is in position 0.  This eliminates any leading or trailing sequences of
     * 0s.
     *
     * Negate the result, then count the number of times we can do:
     *
     *   bits &= bits << 1
     *
     * until bits becomes 0.  Return this count, which is the length of the longest
     * binary gap.
     *
     * Demonstrating with an 8-bit value (instead of 64) we have:
     * 
     *   00110010  input value
     *   11001000  shifted left
     *   11111001  shifted right
     *   00000110  negated
     *   00000100  after first "shft &= shft << 1"  (i = 1)
     *   00000000  after second                     (i = 2)
     */
    static uint32_t bingap(uint64_t x)
    {
      uint32_t i;
      
      __asm__ ("xor    %[i],   %[i]     \n\t"
               "cmpq   $0,     %[x]     \n\t"
               "jz     done             \n\t"
               "mov    %[x],   %%rax    \n\t"
               "bsr    %%rax,  %%rcx    \n\t"
               "neg    %%rcx            \n\t"
               "add    $63,    %%rcx    \n\t"
               "shl    %%cl,   %%rax    \n\t"
               "bsf    %%rax,  %%rcx    \n\t"
               "sar    %%cl,   %%rax    \n\t"
               "not    %%rax            \n\t"
               "cmp    $0,     %%rax    \n\t"
               "jz     done             \n\t"
               "loop:                   \n\t"
               "inc    %[i]             \n\t"
               "mov    %%rax,  %%rdx    \n\t"
               "shl    $1,     %%rdx    \n\t"
               "and    %%rdx,  %%rax    \n\t"
               "jnz    loop             \n\t"
               "done:                   \n\t"
               : [i]    "=r" (i)
               : [x]    "rm" (x)
               : "%rax", "%rcx", "%cl", "%rdx");
    
      return i;
    }
    
    // Parse a binary string.
    static uint64_t b2(const std::string& s)
    {
      return std::stoull(s, nullptr, 2);
    }
    
    // Convert to a binary string.
    static std::string b2(uint64_t x)
    {
      std::bitset<64>b(x);
      return b.to_string();
    }
    
    int main(int ac, char* av[])
    {
      for (int i = 1; i < ac; i++) {
        uint64_t x(b2(av[i]));
        uint32_t r(bingap(x));
    
        std::cout << b2(x) << " => " << r << std::endl;
      }
    
      exit(0);
    }
    
    $ ./bingap_asm 100 101 1000110011010 1000000000000000000000000000000000000000001000000000000000000001
    0000000000000000000000000000000000000000000000000000000000000100 => 0
    0000000000000000000000000000000000000000000000000000000000000101 => 1
    0000000000000000000000000000000000000000000000000001000110011010 => 3
    1000000000000000000000000000000000000000001000000000000000000001 => 41
    
  8. matthew said

    @Globules: nice idea and inline assembler is always fun. You could also do much the same thing with gcc’s __builtin_clz and __builtin_ctz functions (‘count leading zeros’ and ‘count trailing zeros’) which should also work with clang & possibly other compilers. Adapting my code above (and 0-terminating the binary string properly – I liked your use of bitset too, BTW, didn’t know you could do that to print binary strings):

    #include <stdint.h>
    #include <stdlib.h>
    #include <stdio.h>
     
    char *binary(uint32_t n, char *s) {
      for (int i = 0; i < 32; i++) {
        s[31-i] = '0' + ((n>>i)&1);
      }
      s[32] = 0;
      return s;
    }
    
    int bgap(uint32_t n) {
      char buff[33];
      if (n == 0) return 0;
      fprintf(stderr,"%s\n", binary(n,buff));
      n <<= __builtin_clz(n);
      fprintf(stderr,"%s\n", binary(n,buff));
      n = ((int32_t)n) >> __builtin_ctz(n);
      fprintf(stderr,"%s\n", binary(n,buff));
      n = ~n;
      int count = 0;
      do {
         fprintf(stderr,"%s\n", binary(n,buff));
        n &= (n<<1);
        count++;
      } while (n != 0);
      return count;
    }
    
    int main(int argc, char* argv[]) {
      if (argc < 2) abort();
      uint32_t n = strtoul(argv[1],0,0);
      int max = bgap(n);
      fprintf(stdout,"%d\n", max);
    }
    

    Using the builtins also mean that our code works on non-x86 platforms, eg. for ARM64, Mr. Godbolt’s online compiler (https://godbolt.org/g/D12gZo) gives:

    bgap(unsigned int):
            mov     w2, 0
            cbz     w0, .L1
            clz     w1, w0
            mov     w2, 0
            lsl     w1, w0, w1
            rbit    w0, w1
            clz     w0, w0
            asr     w1, w1, w0
            mvn     w1, w1
    .L3:
            ands    w1, w1, w1, lsl 1
            add     w2, w2, 1
            bne     .L3
    .L1:
            mov     w0, w2
            ret
    
  9. matthew said

    Some sample output:

    $ g++ -Wall -O2  bgap.cpp -o bgap
    $ ./a.out 529
    00000000000000000000001000010001
    10000100010000000000000000000000
    11111111111111111111111000010001
    00000000000000000000000111101110
    00000000000000000000000111001100
    00000000000000000000000110001000
    00000000000000000000000100000000
    4
    
  10. matthew said

    Just noticed @Daniel has a nice ctz solution up above as well.

  11. programmingpraxis said

    @Globules and others: The problem specified that the input is a number, not an arbitrary bit string, so there will never be a leading zero. The problem doesn’t state, though it should, that the number is always positive, although I think that is implied in the solution.

  12. Globules said

    @Matthew I thought about using built-ins, but decided this would be a nice excuse to learn a little about x86 assembly language. Still, I could have used only the bsr and bsf instructions, assigning their results to variables and doing the rest in C++, but I figured “in for a penny, in for a pound”. (Or, perhaps “penny wise, pound foolish” is more appropriate… :-)

  13. Globules said

    @programmingpraxis Upon initial reading I interpreted “binary representation of a number” as the machine’s binary representation of a number. Once you’ve headed down that path you have to care about leading zeros/bits. Of course, as you point out, a “binary representation of a number” could simply be a list of its digits in binary. In the spirit of the latter interpretation, the following is a Haskell version. (For readability I limit the test values to 16 bits, but the code works for arbitrary size numbers.)

    import Data.Bool (bool)
    import Data.List (group, unfoldr)
    import Data.Tuple (swap)
    import System.Environment
    import Text.Printf
    
    -- Return the binary gap of a number.
    --
    -- We convert the number to its base-2 representation, drop any leading (least
    -- significant 0s), group together sequences of the same digit, keep only those
    -- groups consisting of 0s, map each group to its length, prepend 0 to the
    -- resulting list (in case it was empty), then return the maximum value.
    bingap :: Integral a => a -> Int
    bingap = maximum .
             (0:) .
             map length .
             filter (\(d:_) -> d == 0) .
             group .
             dropWhile (== 0) .
             toBase 2
    
    -- Convert a number to its base-b representation.  The order of the digits will
    -- be from least to most significant.  Zero is the empty list.
    toBase :: Integral a => a -> a -> [a]
    toBase b = unfoldr (\n -> bool Nothing (Just $ swap $ quotRem n b) (n /= 0))
    
    -- 
    --                       Utility Functions
    -- 
    
    -- Interpret a list as the base-b representation of a number.  The order of the
    -- digits is from least to most significant.  The empty list is zero.
    fromBase :: Integral a => a -> [a] -> a
    fromBase b = foldr (\d n -> n * b + d) 0
    
    -- Interpret a string as a base-2 number, converting '0' to 0 and any other
    -- character to 1.
    parseBase2 :: Integral a => String -> a
    parseBase2 = fromBase 2 . reverse . map (bool 1 0 . (== '0'))
    
    test :: Integer -> String
    test n = printf "%016b => %d" n (bingap n)
    
    main :: IO ()
    main = getArgs >>= mapM_ (putStrLn . test . parseBase2)
    
    $ ./bingap 010 0101 110001010010000
    0000000000000010 => 0
    0000000000000101 => 1
    0110001010010000 => 3
    
  14. matthew said

    @Globules: fair enough. Assembler is fun. I was wondering if you needed that comparison on line 57/58 – we already know that n is non-zero at this point.
    @Praxis: Not quite sure what your point is, presence of leading zeros is surely a property of the representation, not the number itself. Anyway, isn’t the underlying problem really about bit sequences so including (representations of) negative numbers seems to make sense (and since we know that the universe is two’s complement, we might as well use that).

  15. Globules said

    @Matthew I think the comparison is needed, at least the way the code is currently structured. Given an argument having only a single sequence of 1 bits (e.g. …0110…) then after the sar instruction the rax register will be all 1s and the not will set it to all 0s. But, according to the docs I was reading the not instruction doesn’t set any flags, so we have to do an explicit comparison to know whether we have all 0s.

  16. matthew said

    @Globules: You are quite right, my code above is wrong. It originally used a ‘while’ rather than a ‘do’ which would have been correct. I changed it because it made the code slightly shorter. Should have realized the compiler probably would have removed the extra comparison itself if it really had been redundant. Corrected version here: https://godbolt.org/g/FCBLQi. Looks like it’s doing the ‘not’ as an xor with -1, which presumably does set the flags.

  17. Globules said

    @Matthew The xor with -1 is a nice trick; I’ve updated my code at home with it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: