Problem 291: Panaitopol Primes

(see projecteuler.net/problem=291)

A prime number p is called a Panaitopol prime if p=dfrac{x^4-y^4}{x^3+y^3} for some positive integers x and y.

Find how many Panaitopol primes are less than 5 * 10^15.

Very inefficient solution

My code needs more than 60 seconds to find the correct result. (scroll down to the benchmark section)
Apparantly a much smarter algorithm exists - or my implementation is just inefficient.

My Algorithm

It's easy to make a brute force search for all Panaitopol prime below 1000.
After about a minute my program displayed 5,13,41,61,113,181,313,421,613,761 (see isPanaitopolPrime and #define BRUTE_FORCE).
I entered this sequence into my favorite internet search engine and found immediately oeis.org/A027862

They claimed that each Panaitopol prime can be represented as n^2 + (n+1)^2.
All I did is copying my Miller-Rabin primality check from my toolbox and testing each number
current = n*n + (n+1)*(n+1) whether it's prime.

It takes about 2 minutes on my machine which exceeds the inofficial time limit of Project Euler.
I saw a few different approaches which are much faster but I don't fully understand the mathematics
(and honestly I can't fully follow the reasoning why each Panaitopol prime is part of n^2 + (n+1)^2).

Note

mulmod was tuned for the 64-bit GCC compiler (on average about 4000 CPU cycles per test → 1 million tests per second / single-threaded).
Visual C++ is much slower.

In September 2017 I added OpenMP support. My 8-core computer finds the correct result in less than 20 seconds now.
However, the single-core performance remains unchanged and I still label my solution as "inefficient".

Interactive test

You can submit your own input to my program and it will be instantly processed at my server:

Input data (separated by spaces or newlines):
Note: Enter the upper search limit.

This is equivalent to
echo 1000 | ./291

Output:

(please click 'Go !')

Note: the original problem's input 5000000000000000 cannot be entered
because just copying results is a soft skill reserved for idiots.

(this interactive test is still under development, computations will be aborted after one second)

My code

… was written in C++11 and can be compiled with G++, Clang++, Visual C++. You can download it, too.

#include <iostream>
#include <iomanip>
#include <cmath>
 
// ---------- my initial brute-force code ----------
 
// brute-force search
// only good up for panaitopol primes below 1000
bool isPanaitopolPrime(unsigned int p)
{
// simple prime check
if (p % 2 == 0)
return p == 2;
 
for (unsigned int i = 3; i*i <= p; i += 2)
if (p % i == 0)
return false;
 
for (unsigned long long x = 1; x < 20*p; x++)
for (unsigned long long y = 1; y < x; y++)
{
auto num = x*x*x*x - y*y*y*y;
auto den = x*x*x + y*y*y;
 
if (num <= den)
continue;
if (num % den != 0)
continue;
 
if (num / den != p)
continue;
 
std::cout << "prime=" << p << " x=" << x << " y=" << y << std::endl;
return true;
}
 
return false;
}
 
// ---------- Miller-Rabin prime test ----------
 
// return (a*b) % modulo
unsigned long long mulmod(unsigned long long a, unsigned long long b, unsigned long long modulo)
{
// (a * b) % modulo = (a % modulo) * (b % modulo) % modulo
a %= modulo;
b %= modulo;
 
// fast path
if (a <= 0xFFFFFFF && b <= 0xFFFFFFF)
return (a * b) % modulo;
 
#ifdef __GNUC__
#ifdef __x86_64__
// use GCC inline assembler
unsigned long long asmResult;
__asm__
(
"mulq %2\n" // result in rdx:rax
"divq %3" // quotient in rax, remainder in rdx
: "=&d" (asmResult), "+%a" (a)
: "rm" (b), "rm" (modulo)
: "cc" // clear conditions
);
return asmResult;
 
#else
 
// based on GCC's 128 bit implementation
return ((unsigned __int128)a * b) % modulo;
#endif
#endif
 
// we might encounter overflows (slow path)
// the number of loops depends on b, therefore try to minimize b
if (b > a)
std::swap(a, b);
 
// bitwise multiplication
unsigned long long result = 0;
while (a > 0 && b > 0)
{
// b is odd ? a*b = a + a*(b-1)
if (b & 1)
{
result += a;
result %= modulo;
// skip b-- because the bit-shift at the end will remove the lowest bit anyway
}
 
// b is even ? a*b = (2*a)*(b/2)
a <<= 1;
a %= modulo;
 
// next bit
b >>= 1;
}
 
return result;
}
 
// return (base^exponent) % modulo
unsigned long long powmod(unsigned long long base, unsigned long long exponent, unsigned long long modulo)
{
unsigned long long result = 1;
while (exponent > 0)
{
// fast exponentation:
// odd exponent ? a^b = a*a^(b-1)
if (exponent & 1)
result = mulmod(result, base, modulo);
 
// even exponent ? a^b = (a*a)^(b/2)
base = mulmod(base, base, modulo);
exponent >>= 1;
}
return result;
}
 
// Miller-Rabin-test
bool isPrime(unsigned long long p)
{
// IMPORTANT: requires mulmod(a, b, modulo) and powmod(base, exponent, modulo)
 
// some code from https://ronzii.wordpress.com/2012/03/04/miller-rabin-primality-test/
// with optimizations from http://ceur-ws.org/Vol-1326/020-Forisek.pdf
// good bases can be found at http://miller-rabin.appspot.com/
 
// trivial cases
const unsigned int bitmaskPrimes2to31 = (1 << 2) | (1 << 3) | (1 << 5) | (1 << 7) |
(1 << 11) | (1 << 13) | (1 << 17) | (1 << 19) |
(1 << 23) | (1 << 29); // = 0x208A28Ac
if (p < 31)
return (bitmaskPrimes2to31 & (1 << p)) != 0;
 
if (p % 2 == 0 || p % 3 == 0 || p % 5 == 0 || p % 7 == 0 || // divisible by a small prime
p % 11 == 0 || p % 13 == 0 || p % 17 == 0)
return false;
 
if (p < 17*19) // we filtered all composite numbers < 17*19, all others below 17*19 must be prime
return true;
 
// test p against those numbers ("witnesses")
// good bases can be found at http://miller-rabin.appspot.com/
const unsigned int STOP = 0;
const unsigned int TestAgainst1[] = { 377687, STOP };
const unsigned int TestAgainst2[] = { 31, 73, STOP };
const unsigned int TestAgainst3[] = { 2, 7, 61, STOP };
// first three sequences are good up to 2^32
const unsigned int TestAgainst4[] = { 2, 13, 23, 1662803, STOP };
const unsigned int TestAgainst7[] = { 2, 325, 9375, 28178, 450775, 9780504, 1795265022, STOP };
 
// good up to 2^64
const unsigned int* testAgainst = TestAgainst7;
// use less tests if feasible
if (p < 5329)
testAgainst = TestAgainst1;
else if (p < 9080191)
testAgainst = TestAgainst2;
else if (p < 4759123141ULL)
testAgainst = TestAgainst3;
else if (p < 1122004669633ULL)
testAgainst = TestAgainst4;
 
// find p - 1 = d * 2^j
auto d = p - 1;
d >>= 1;
unsigned int shift = 0;
while ((d & 1) == 0)
{
shift++;
d >>= 1;
}
 
// test p against all bases
do
{
auto x = powmod(*testAgainst++, d, p);
// is test^d % p == 1 or -1 ?
if (x == 1 || x == p - 1)
continue;
 
// now either prime or a strong pseudo-prime
// check test^(d*2^r) for 0 <= r < shift
bool maybePrime = false;
for (unsigned int r = 0; r < shift; r++)
{
// x = x^2 % p
// (initial x was test^d)
x = mulmod(x, x, p);
// x % p == 1 => not prime
if (x == 1)
return false;
 
// x % p == -1 => prime or an even stronger pseudo-prime
if (x == p - 1)
{
// next iteration
maybePrime = true;
break;
}
}
 
// not prime
if (!maybePrime)
return false;
} while (*testAgainst != STOP);
 
// prime
return true;
}
 
int main()
{
// result
unsigned long long count = 0;
 
//#define BRUTE_FORCE
#ifdef BRUTE_FORCE
// super-slow brute-force approach to find the first numbers
for (unsigned int i = 1; i < 1000; i++)
if (isPanaitopolPrime(i))
count++;
#else
 
// 5*10^^15
auto limit = 5000000000000000ULL;
std::cin >> limit;
 
// prime can be represented as n^2 + (n+1)^2
// which is roughly 2 * (n+1)^2
unsigned long long last = sqrt(limit / 2);
//#define PARALLEL
#ifdef PARALLEL
#pragma omp parallel for reduction(+:count) schedule(dynamic)
#endif
for (unsigned long long n = 1; n < last; n++) // abort condition is inside the loop
{
// find next candidate
auto current = n*n + (n+1)*(n+1);
if (current <= limit && isPrime(current))
count++;
}
#endif
 
// display result
std::cout << count << std::endl;
return 0;
}

This solution contains 39 empty lines, 53 comments and 14 preprocessor commands.

Benchmark

The correct solution to the original Project Euler problem was found in  121 seconds  (exceeding the limit of 60 seconds).
The code can be accelerated with OpenMP but the timings refer to the single-threaded version on an Intel® Core™ i7-2600K CPU @ 3.40GHz.
(compiled for x86_64 / Linux, GCC flags: -O3 -march=native -fno-exceptions -fno-rtti -std=gnu++11 -DORIGINAL)

See here for a comparison of all solutions.

Note: interactive tests run on a weaker (=slower) computer. Some interactive tests are compiled without -DORIGINAL.

Changelog

July 7, 2017 submitted solution
July 7, 2017 added comments
September 29, 2017 OpenMP support

Difficulty

45% Project Euler ranks this problem at 45% (out of 100%).

Heatmap

Please click on a problem's number to open my solution to that problem:

green   solutions solve the original Project Euler problem and have a perfect score of 100% at Hackerrank, too
yellow solutions score less than 100% at Hackerrank (but still solve the original problem easily)
gray problems are already solved but I haven't published my solution yet
blue solutions are relevant for Project Euler only: there wasn't a Hackerrank version of it (at the time I solved it) or it differed too much
orange problems are solved but exceed the time limit of one minute or the memory limit of 256 MByte
red problems are not solved yet but I wrote a simulation to approximate the result or verified at least the given example - usually I sketched a few ideas, too
black problems are solved but access to the solution is blocked for a few days until the next problem is published
[new] the flashing problem is the one I solved most recently

I stopped working on Project Euler problems around the time they released 617.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
126 127 128 129 130 131 132 133 134 135 136 137 138