Problem 324: Building a tower

(see projecteuler.net/problem=324)

Let f(n) represent the number of ways one can fill a 3 * 3 * n tower with blocks of 2 * 1 * 1.
You're allowed to rotate the blocks in any way you like; however, rotations, reflections etc of the tower itself are counted as distinct.

For example (with q = 100000007):
f(2) = 229,
f(4) = 117805,
f(10) mod q = 96149360,
f(10^3) mod q = 24806056,
f(10^6) mod q = 30808124.

Find f(10^10000) mod 100000007.

My Algorithm

The tower consists of n levels, each consisting of 3x3 units (I call them layer throughout my code).
Each layer is completely filled with blocks, each made of 2 units.

When I look at a single layer in bird eye's view then these blocks can have four orientiations:

Both units of a horizontal or vertical block are located in the same layer:
HH.
..V
..V

"Up" blocks have one unit in the current layer and one unit in the layer above.
"Down" blocks have one unit in the current layer and one unit in the layer below.

Obviously, the bottom layer of the tower cannot contain any "down" blocks.
And the top layer of the tower cannot contain any "up" blocks.

My function createLayers recursively generates all possible layers.
It fills are 9-character string with all combinations of -,|,U,D.
When inserting - (a horizontal block) or | (a vertical blocks) then it inserts two characters at once.
The whole procedure is blazingly fast and the container layers stores all 3940 distinct layers.

Not every layer can be placed on top of each other because "up" and "down" blocks must match.
Therefore I added the concept of "borders":
a "border" is a bitmask which is 1 for each unit which belongs to a block that is shared between two layers ("up"/"down" block).
Each layer has two borders: its top and bottom borders. These borders have 9 bits because each layer has 3x3=9 units.
Only if the n-th unit is "down" then the n-th bit of the bottom border is set.
Only if the n-th unit is "up" then the n-th bit of the top border is set.
That means that there are just 2^9 = 512 distinct borders.
It's also easy to see that the bottom layer of the tower must be zero (no blocks "extending" into the ground).
Moreover, the top layer of the tower is zero, too (flat roof).

To verify my idea I wrote a simple bruteForce algorithm which is based on the concept of divide'n'conquer:
That algorithm verified the number of the test cases f(2), f(4) and f(10).
But execution time was slow even with excessive use of memoization.

A few days ago I solved problem 458 with a state machine / Markov chain and a matrix:
There's a tiny problem: raising a 512x512 matrix to the 10^10000-th power is extremely slow.
Fast exponentiation helps, but those dimensions are still way too large.
I ran the code before going to bed and the correct result popped up a few minutes after I woke up in the morning.

I tried to reduce the number of borders (thus shrinking the matrix) but exploiting rotations but somehow messed up and couldn't reproduce the already found correct result.
No idea where the bug was hidden ... but before I started fixing it I came up another idea:
So I wrote the removeUnreachable function: starting at (0,0) it checks which cells of the matrix contribute to (0,0) directly and indirectly.
It's a simple path-finding algorithm: is there a route between (0,0) and (x,y) where each intermediate step at (a,b) is not zero ?
It turns out only 252 out 512 borders can be found in the tower at all, that's a reduction of about 50%.
Since matrix multiplication is a O(n^3) algorithm, I cut down execution time by factor 8.

My program still needed more than a minute so I improved the Matrix class and found a few speed-ups:
→ down to 6 minutes

More or less by chance I printed the final matrix and still saw many numbers M(x,y) = 0.
Fascinated by this observation, I ran the removeUnreachable function on the final matrix and it reduced the matrix to only 126x126.
Then I aggressively tried to run removeUnreachable as often as possible and found:
Solving f(10^10000) - based on the reduced 126x126 matrix - finally takes less than a minute and is about as fast as the brute-code algorithm for f(10).

Alternative Approaches

The first values of the sequence f can be found in OEIS A028452 along with the generating function which is full of "magic constants".
The generating function could probably be evaluated in milliseconds - in contrast to my solution which almost exceeds the time limit of 1 minute.

The matrix can be further reduced to 23x23 if you write bug-free code properly handling rotations (as well as flipping).
I don't quite understand the reasoning, but even a 19x19 matrix is discussed in the Project Euler forum.

Note

Major parts of the Matrix class come from my solution of problem 458 (and were heavily optimized for the current problem).
Aside from that this problem has the largest chunk of code originally written for a specific problem.

Interactive test

This feature is not available for the current problem.

My code

… was written in C++11 and can be compiled with G++, Clang++, Visual C++. You can download it, too.

#include <iostream>
#include <map>
#include <set>
#include <vector>
#include <tuple>
 
// everything modulo 10^9+7 (which is a prime)
const unsigned int Modulo = 100000007;
 
// different types of cells in a layer
const char Up = 'U';
const char Down = 'D';
const char Horizontal = '-';
const char Vertical = '|';
const char Empty = ' ';
 
// a single 3x3 level of the tower
typedef std::array<char, 9> Layer;
// store all different "designs" of completely filled layers, each is assigned a unique ID
std::set<Layer> layers;
 
// 2^9 different intersections between two layers
const unsigned int NumBorders = 1 << 9;
// remember how often each intersection can be observed
unsigned int borders[NumBorders][NumBorders] = { 0 };
 
// recursively create all different layers, add them to the "layers" container
void createLayers(Layer current)
{
// 3x3 => 9 cells per layer
// each cell is either 'U', 'D', 'H' or 'V'
// cells which are not processed yet are ' '
 
// find first empty cell
bool full = true;
unsigned int pos = 0;
for (; pos < 9; pos++)
if (current[pos] == Empty)
{
full = false;
break;
}
 
// no empty cells ?
if (full)
{
// std::set avoids duplicates
layers.insert(current);
return;
}
 
// attempt to insert a block crossing two layers
current[pos] = Up;
createLayers(current);
//current[pos] = Empty;
current[pos] = Down;
createLayers(current);
//current[pos] = Empty;
 
// if the right neighbor is empty, too, then add a horizontal layer
if (pos % 3 != 2 && current[pos+1] == Empty)
{
current[pos] = Horizontal;
current[pos+1] = Horizontal;
createLayers(current);
current[pos+1] = Empty;
//current[pos] = Empty;
}
 
// and finally add a vertical layer
if (pos < 6 && current[pos+3] == Empty)
{
current[pos] = Vertical;
current[pos+3] = Vertical;
createLayers(current);
//current[pos+3] = Empty;
//current[pos] = Empty;
}
}
 
// extract upper and lower border of a layer
void addBorders(Layer layer)
{
// bitmasks for the top and bottom border of this layer, 1 means "crossing", 0 means "nope, not crossing"
unsigned int top = 0;
unsigned int bottom = 0;
for (unsigned int i = 0; i < 9; i++)
{
// extending into the layer above ?
if (layer[i] == Up)
top |= 1 << i;
// extending into the layer below ?
if (layer[i] == Down)
bottom |= 1 << i;
}
 
borders[bottom][top]++;
}
 
// count number of combinations of a (partial) tower where the bottom and top layer are known as well as the height
unsigned long long bruteForce(unsigned int maskBottom, unsigned int maskTop, unsigned int height)
{
// degenerated case
if (height == 0)
return 0;
// count how many layers have that special bottom and top mask
if (height == 1)
return borders[maskBottom][maskTop];
 
// memoize
typedef std::tuple<unsigned int, unsigned int, unsigned int> Id;
static std::map<Id, unsigned long long> cache;
Id id(maskBottom, maskTop, height);
auto lookup = cache.find(id);
if (lookup != cache.end())
return lookup->second;
 
// divide'n'conquer:
// split the (partial) tower in half
//auto half = height / 2;
// it's slightly more efficient if it's not split 50:50 but one part is a power of two (access memoized data more frequently)
unsigned int powerOfTwo = 1;
while (powerOfTwo * 2 < height)
powerOfTwo *= 2;
 
auto heightTop = powerOfTwo; // instead of "half"
auto heightBottom = height - heightTop;
 
unsigned long long result = 0;
const unsigned int NumMasks = 1 << 9;
for (unsigned int middle = 0; middle < NumMasks; middle++)
{
// each "half"-tower below the middle can be combined with each "half"-tower above the middle
result += bruteForce(maskBottom, middle, heightBottom) *
bruteForce(middle, maskTop, heightTop);
result %= Modulo;
}
 
cache[id] = result;
return result;
}
 
// quadratic 2D matrix with powmod, based on my solution of problem 458
template <typename Number>
class Matrix
{
// store all elements as a flat 1D array
std::vector<Number> data;
// number of rows / columns
unsigned int size_;
 
public:
// set all elements to zero
Matrix(unsigned int size__ = 1) // these underscores are sooooo ugly ...
: data(size__ * size__, 0),
size_(size__)
{ }
 
// height / width of the quadratic matrix
unsigned int size() const
{
return size_;
}
 
// access a field (read/write), indices are zero-based
Number& operator()(unsigned int column, unsigned int row)
{
return data[column * size_ + row];
}
// access a field (read-only), indices are zero-based
Number get(unsigned int column, unsigned int row) const
{
return data[column * size_ + row];
}
 
// multiply two matrices
Matrix operator*(const Matrix& other) const
{
Matrix result(size_); // initially all fields are zero
for (unsigned int i = 0; i < size_; i++)
for (unsigned int j = 0; j < size_; j++)
{
if (other.get(i,j) == 0) // optional, just a performance tweak
continue;
 
for (unsigned int k = 0; k < size_; k++)
result(i,k) += get(j,k) * other.get(i,j);
}
return result;
}
 
// multiply two symmetric matrices, with modulo
Matrix multiplySymmetric(const Matrix& other, unsigned int modulo) const
{
Matrix result(size_); // initially all fields are zero
 
// compute one half (without modulo)
for (unsigned int i = 0; i < size_; i++)
for (unsigned int j = 0; j < size_; j++)
for (unsigned int k = i; k < size_; k++) // start at i instead of zero
result(i,k) += get(j,k) * other.get(i,j);
 
// copy to the other half of the matrix and compute modulo
for (unsigned int i = 0; i < size_; i++)
{
// along the diagonal
result(i,i) = result.get(i,i) % modulo;
// and everything else
for (unsigned int j = i + 1; j < size_; j++)
result(j,i) = result(i,j) = result(i,j) % modulo;
}
return result;
}
 
// fast exponentiation with modulo
Matrix powmod(unsigned long long exponent, unsigned int modulo) const
{
// more or less the same concept as powmod from my toolbox (which works on integers instead of matrices)
 
// optional performance tweak
if (exponent == 1)
return *this;
 
// start with the identity matrix
Matrix result(size_);
for (unsigned int i = 0; i < size_; i++)
result(i,i) = 1;
bool isIdentity = true;
 
Matrix base = *this;
 
while (exponent > 0)
{
// fast exponentation:
// odd exponent ? a^b = a*a^(b-1)
if (exponent & 1)
{
// optional optimization: avoid multiplying by the identity matrix
if (isIdentity)
{
result = base;
isIdentity = false;
}
else
{
//result = result * base
result = result.multiplySymmetric(base, modulo);
}
}
 
// even exponent ? a^b = (a*a)^(b/2)
if (exponent > 1)
{
//base = base * base;
base = base.multiplySymmetric(base, modulo);
}
 
exponent >>= 1;
}
return result;
}
};
 
// find all states that can be reached from the initial state 0 and shrink matrix accordingly
Matrix<unsigned long long> removeUnreachable(const Matrix<unsigned long long>& matrix)
{
// collect all reachable states
std::set<unsigned int> reachable;
std::set<unsigned int> todo = { 0 }; // 0 => initial state
while (!todo.empty())
{
// pick next state
auto current = *(todo.begin());
todo.erase(current);
reachable.insert(current);
 
// put all new states on the todo list
for (unsigned int i = 0; i < matrix.size(); i++)
if (matrix.get(current,i) > 0 &&
reachable.count(i) == 0) // avoid re-visited already processed states
todo.insert(i);
}
 
// size of the new matrix
auto matrixSize = reachable.size();
if (matrixSize == matrix.size())
return matrix; // no optimizations found ?
 
// create a smaller matrix by skipping unreachable states
Matrix<unsigned long long> smaller(matrixSize);
 
// x and y will be consecutive while I skip a few state of i and j
unsigned int x = 0;
for (unsigned int i = 0; i < NumBorders; i++)
{
// ignore that column ?
if (reachable.count(i) == 0)
continue;
 
unsigned int y = 0;
for (unsigned int j = 0; j < NumBorders; j++)
{
// ignore that row ?
if (reachable.count(j) == 0)
continue;
 
// copy cell
smaller(x,y) = matrix.get(i,j);
y++;
}
x++;
}
 
// done !
return smaller;
}
 
int main()
{
// this parameter has two different meanings:
// - in brute-force mode it represents the height of the tower
// - in "fast" mode the height of the tower is 10^limit
unsigned int limit = 10000;
std::cin >> limit;
 
// start with an empty layer and search for all distinct ways to fill it
Layer nothing;
for (auto& x : nothing)
x = Empty;
createLayers(nothing);
// => 3940 layers
 
// extract the 512 distinct borders from these 3940 layers
for (auto layer : layers)
addBorders(layer);
 
// top and bottom borders must be zero (no blocks "crossing" to the outside)
const unsigned int InitialState = 0;
const unsigned int FinalState = 0;
 
//#define BRUTE_FORCE
#ifdef BRUTE_FORCE
// note: the height is just 10, not 10^10
std::cout << bruteForce(InitialState, FinalState, limit) << std::endl;
// f(6) % q = 64647289
// f(8) % q = 69563725
// f(10) takes about 40 seconds
return 0;
#endif
 
// copy to a matrix
auto matrixSize = NumBorders;
Matrix<unsigned long long> matrix(matrixSize);
for (unsigned int i = 0; i < matrixSize; i++)
for (unsigned int j = 0; j < matrixSize; j++)
matrix(i,j) = borders[i][j];
 
// remove unreachable "garbage" states, only 252 out of 512 states will be left
matrix = removeUnreachable(matrix);
 
// the first chunk (10^1) ...
matrix = matrix.powmod(10, Modulo);
// further reduction of states to 126
matrix = removeUnreachable(matrix);
 
// ... and the remaining 10^9999 chunks
auto remaining = limit - 1;
const auto AtOnce = 19; // process up to 10^18 per iteration
while (remaining > 0)
{
// find largest number 10^x which still fits in a 64 bit integer
unsigned long long power10 = 1;
for (auto i = 1; i < AtOnce && remaining > 0; i++, remaining--)
power10 *= 10;
 
// thats where 99% of the total execution time is spent ...
matrix = matrix.powmod(power10, Modulo);
}
 
// number of transitions from state 0 to state 0 (with 10^10000 layers inbetween)
std::cout << matrix(InitialState, FinalState) << std::endl;
return 0;
}

This solution contains 53 empty lines, 88 comments and 7 preprocessor commands.

Benchmark

The correct solution to the original Project Euler problem was found in 45.7 seconds on an Intel® Core™ i7-2600K CPU @ 3.40GHz.
Peak memory usage was about 6 MByte.

(compiled for x86_64 / Linux, GCC flags: -O3 -march=native -fno-exceptions -fno-rtti -std=gnu++11 -DORIGINAL)

See here for a comparison of all solutions.

Note: interactive tests run on a weaker (=slower) computer. Some interactive tests are compiled without -DORIGINAL.

Changelog

December 20, 2017 submitted solution
December 20, 2017 added comments

Difficulty

50% Project Euler ranks this problem at 50% (out of 100%).

Heatmap

Please click on a problem's number to open my solution to that problem:

green   solutions solve the original Project Euler problem and have a perfect score of 100% at Hackerrank, too
yellow solutions score less than 100% at Hackerrank (but still solve the original problem easily)
gray problems are already solved but I haven't published my solution yet
blue solutions are relevant for Project Euler only: there wasn't a Hackerrank version of it (at the time I solved it) or it differed too much
orange problems are solved but exceed the time limit of one minute or the memory limit of 256 MByte
red problems are not solved yet but I wrote a simulation to approximate the result or verified at least the given example - usually I sketched a few ideas, too
black problems are solved but access to the solution is blocked for a few days until the next problem is published
[new] the flashing problem is the one I solved most recently

I stopped working on Project Euler problems around the time they released 617.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250
251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275
276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300
301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325
326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375
376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400
401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425
426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450
451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475
476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500
501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525
526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550
551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575
576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600
601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625
626 627 628 629 630 631 632 633 634 635 636 637
The 310 solved problems (that's level 12) had an average difficulty of 32.6% at Project Euler and
I scored 13526 points (out of 15700 possible points, top rank was 17 out of ≈60000 in August 2017) at Hackerrank's Project Euler+.

My username at Project Euler is stephanbrumme while it's stbrumme at Hackerrank.

Look at my progress and performance pages to get more details.

more about me can be found on my homepage, especially in my coding blog.
some names mentioned on this site may be trademarks of their respective owners.
thanks to the KaTeX team for their great typesetting library !