Sexy Primes
August 23, 2019
Today’s exercise was inspired by this post at Stack Overflow asking for the count of consecutive prime pairs with a difference of 6 between one and two billion; I modified the task to count sexy primes. The difference is that two primes can be sexy but not consecutive; for instance, 5 and 11 form a sexy pair, but they are not consecutive because 7 is prime. We’ll write several solutions for this task, using Python because that poster at Stack Overflow asked for Python.
Our first solution identifies sexy primes p by using a primality test to check both p and p + 6 for primality:
def isSexy(n): return isPrime(n) and isPrime(n+6)
Then we check each odd number from one billion to two billion:
counter = 0 for n in xrange(1000000001, 2000000000, 2): if isSexy(n): counter += 1
print counter
That takes an estimated two hours on my machine, determined by running it from 1 billion to 1.1 billion and multiplying by 10. We need something better.
Our second solution uses our standard prime generator to generate the primes from one billion to two billion, checking for each prime p if p – 6 is on the list:
counter = 0 ps = primeGen(1000000000) p2 = next(ps); p1 = next(ps); p = next(ps) while p < 2000000000: if p - p2 == 6 or p - p1 == 6: counter += 1 p2 = p1; p1 = p; p = next(ps)
print counter
That took about 8.5 minutes and produced the correct result.
Our third solution uses a segmented sieve to make a list of primes from one billion to two billion, then scans the list counting the sexy primes:
counter = 0 ps = primes(1000000000, 2000000000) if ps[1] - ps[0] == 6: counter += 1
for i in xrange(2,len(ps)): if ps[i] - ps[i-2] == 6 or ps[i] - ps[i-1] == 6: counter += 1
print counter
That took about four minutes and produced the correct result. Almost all of the time went to sieving. I suspect that a lot of that time was spent actually storing the primes; there are over 47,374,753 primes between one billion and two billion, which requires a lot of space, and a lot of time to manipulate all that space. Among other things, this shows that our prime generator is not outlandishly slow.
Our fourth and final solution requires bespoke code rather than using the libraries as did our first three solutions. The idea is to sieve each of the polynomials 6n + 1 and 6n − 1 separately, scan for adjacent pairs, and combine the counts.
This is a little bit tricky, so let’s look at an example: sieve 6n + 1 on the range 100 to 200 using the primes 5, 7, 11 and 13, which are the sieving primes less than 200 (excluding 2 and 3, which divide 6). The sieve has 17 elements 103, 109, 115, 121, 127, 133, 139, 145, 151, 157, 163, 169, 175, 181, 187, 193, 199. The least multiple of 5 in the list is 115, so we strike 115, 145 and 175 (every 5th item) from the sieve. The least multiple of 7 in the list is 133, so we strike 133 and 175 (every 7th item) from the sieve. The least multiple of 11 in the list is 121, so we strike 121 and 187 (every 11th item) from the list. And the least multiple of 13 in the list is 169, so we strike it from the list (it’s in the middle of a 17-item list, and has no other multiples in the list). The primes that remain in the list are 103, 109, 127, 139, 151, 157, 163, 181, 193, and 199; of those, 103, 151, 157 and 193 are sexy.
The trick is finding the offset in the sieve of the first multiple of the prime. The formula is (lo + p) / -6 (mod p), where lo is the first element in the sieve (103 in the example above) and p is the prime; -6 comes from the gap between successive elements of the sieve. In modular arithmetic, division is undefined, so we can’t just divide by -6; instead, we find the modular inverse. And since modular inverse is undefined for negative numbers, we first convert -6 to its equivalent mod p. Thus, for our four sieving primes, the offsets into the sieve are:
((103 + 5) * inverse(-6 % 5, 5)) % 5 = 2 ==> points to 115 ((103 + 7) * inverse(-6 % 7, 7)) % 7 = 5 ==> points to 133 ((103 + 11) * inverse(-6 % 11, 11)) % 11 = 3 ==> points to 121 ((103 + 13) * inverse(-6 % 13, 13)) % 13 = 11 ==> points to 169
Sieving sieve 6n – 1 works the same way, except that lo points to 101 instead of 103; the sieve contains 101, 107, 113, 119, 125, 131, 137, 143, 149, 155, 161, 167, 173, 179, 185, 191, 197:
((101 + 5) * inverse(-6 % 5, 5)) % 5 = 4 ==> points to 125 ((101 + 7) * inverse(-6 % 7, 7)) % 7 = 3 ==> points to 119 ((101 + 11) * inverse(-6 % 11, 11)) % 11 = 7 ==> points to 143 ((101 + 13) * inverse(-6 % 13, 13)) % 13 = 7 ==> points to 143
After sieving, the numbers that remain in the sieve are 101, 107, 113, 131, 137, 149, 167, 173, 179, 191, 197, of which 101, 107, 131, 167, 173 and 191 are sexy, so there are 10 sexy primes between 100 and 200.
Here’s the code:
start = time() counter = 0 ps = primes(isqrt(2000000000))[2:] size = (2000000000-1000000000)/6+1
# sieve on 6n-1 lo = 1000000000/6*6+5 sieve = [True] * size for p in ps: q = ((lo+p)*inverse(-6%p,p))%p for i in xrange(q,size,p): sieve[i] = False
for i in xrange(1,size): if sieve[i] and sieve[i-1]: counter += 1# sieve on 6n+1 lo += 2 sieve = [True] * size for i in xrange(0,size): sieve[i] = Truefor p in ps: q = ((lo+p)*inverse(-6%p,p))%p for i in xrange(q,size,p): sieve[i] = Falsefor i in xrange(1,size):
if sieve[i] and sieve[i-1]:
counter += 1print counter print time() - start print "Done"That took 3:21 and produced the correct result. There are 5924680 sexy primes between one billion and two billion.
In Python. The lazy prime generator is based on a segmented sieve and a little too much code to copy here. Thee deque keeps the tested prime in D[0] and has further 3 next primes to test against.
This takes some 90 secs on my (old) PC and reproduces the answer of ProgrammingPraxis.
Rust version, takes about 45 secs on my PC:
Output:
Here’s a sieve-based C solution.
Output:
I forgot to call free :-(
(my preference is to explicitly clean up, even if the memory is reclaimed by the OS)
Here’s the C++ version of my code above. It uses vector instead of a char array, to reduce the program’s memory usage from 2GB to 250MB, since it can use a single bit for each boolean instead of a byte. Additionally, the execution time on my laptop decreases from 22 seconds for the C version to 17 seconds for the C++ version (both compiled with -O3).
Output:
Here’s a solution in C that may not quite be in the spirit of the
question since it uses the excellent primesieve library to do all the
heavy lifting but, hey, it works and it’s fast, and it isn’t my
homework. Runs in about half a second on my very modest
computer. Bam!
<
pre class="src src-sh">time ./sexy-primes
Output:
Daniel inspired me to re-write the Rust version with a bit vector (home-rolled) instead of a bool vector. His C++ version and this Rust version both run in 25 secs on my home PC. The Rust bool version above runs 3 secs slower (28 secs) on my home PC (timing of 45 secs above was on my work PC).
Rust output:
Daniel’s C++ version:
Here’s a basic segmented sieve solution, using a vector (ie. a bitmap representation) and working in segments of 128k entries. Takes a little under 3 seconds on my laptop, versus 28 seconds for Daniel’s C++ solution above. Not competitive with Chaw’s impressively fast primesieve solution yet though, I must have a look at what that is doing:
This Rust one runs in under 0.6 seconds on my Google Pixelbook with Crostini Linux. It’s faster than the previous Rust one above because:
1) it only looks at odd numbers
2) it calculates the primes in segments which is faster because the segments fit in the CPU cache
3) removed bit vectors as ultimately they were slower
4) it uses rayon to parallelize computing the segment sieves and find the sexy primes
Playground link: https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=63d60dbd91efa0106ee96bba0c6983ed