<< problem 323  BitwiseOR operations on random integers  Rooms of Doom  problem 327 >> 
Problem 324: Building a tower
(see projecteuler.net/problem=324)
Let f(n) represent the number of ways one can fill a 3 * 3 * n tower with blocks of 2 * 1 * 1.
You're allowed to rotate the blocks in any way you like; however, rotations, reflections etc of the tower itself are counted as distinct.
For example (with q = 100000007):
f(2) = 229,
f(4) = 117805,
f(10) mod q = 96149360,
f(10^3) mod q = 24806056,
f(10^6) mod q = 30808124.
Find f(10^10000) mod 100000007.
My Algorithm
The tower consists of n levels, each consisting of 3x3 units (I call them layer
throughout my code).
Each layer is completely filled with blocks, each made of 2 units.
When I look at a single layer in bird eye's view then these blocks can have four orientiations:
 horizontal
 vertical
 up
 down
HH.
..V
..V
"Up" blocks have one unit in the current layer and one unit in the layer above.
"Down" blocks have one unit in the current layer and one unit in the layer below.
Obviously, the bottom layer of the tower cannot contain any "down" blocks.
And the top layer of the tower cannot contain any "up" blocks.
My function
createLayers
recursively generates all possible layers.It fills are 9character string with all combinations of
,,U,D
.When inserting

(a horizontal block) or 
(a vertical blocks) then it inserts two characters at once.The whole procedure is blazingly fast and the container
layers
stores all 3940 distinct layers.Not every layer can be placed on top of each other because "up" and "down" blocks must match.
Therefore I added the concept of "borders":
a "border" is a bitmask which is 1 for each unit which belongs to a block that is shared between two layers ("up"/"down" block).
Each layer has two borders: its top and bottom borders. These borders have 9 bits because each layer has 3x3=9 units.
Only if the nth unit is "down" then the nth bit of the bottom border is set.
Only if the nth unit is "up" then the nth bit of the top border is set.
That means that there are just 2^9 = 512 distinct borders.
It's also easy to see that the bottom layer of the tower must be zero (no blocks "extending" into the ground).
Moreover, the top layer of the tower is zero, too (flat roof).
To verify my idea I wrote a simple
bruteForce
algorithm which is based on the concept of divide'n'conquer: the top and bottom layer of the tower must be zero, so start with
bruteForce(0,0,height)
 split the tower into an upper and a lower half (it doesn't have to be a 50:50 split !)
 every possible border can be found between the upper and the lower half
 iterate over all 512 borders and recursively call
bruteForce
twice:bruteForce(bottom,middle,height/2)
andbruteForce(middle,top,height/2)
 every upper half is "compatible" to every lower half, so multiply their numbers
 the number of layers for a tower with
height = 1
can be looked up in theborders
container previously generated
But execution time was slow even with excessive use of memoization.
A few days ago I solved problem 458 with a state machine / Markov chain and a matrix:
 the number of transitions from border A to border B are already stored in the
borders
container  that's a 512x512 matrix
 raise M to the 10^10000th power
 since 10^10000 is too large to be done at once, I repeatedly exponentiate the matrix 10^19 times (that number barely fits in 64 bits) until arriving at 10^10000
Fast exponentiation helps, but those dimensions are still way too large.
I ran the code before going to bed and the correct result popped up a few minutes after I woke up in the morning.
I tried to reduce the number of borders (thus shrinking the matrix) but exploiting rotations but somehow messed up and couldn't reproduce the already found correct result.
No idea where the bug was hidden ... but before I started fixing it I came up another idea:
 many numbers in the 512x512 matrix are zero
 after matrix exponentiation, the final result will be at (0,0) because the top and bottom borders must be zero
 what if some parts of the matrix don't affect the value at (0,0) at all ?
removeUnreachable
function: starting at (0,0) it checks which cells of the matrix contribute to (0,0) directly and indirectly.It's a simple pathfinding algorithm: is there a route between (0,0) and (x,y) where each intermediate step at (a,b) is not zero ?
It turns out only 252 out 512 borders can be found in the tower at all, that's a reduction of about 50%.
Since matrix multiplication is a O(n^3) algorithm, I cut down execution time by factor 8.
My program still needed more than a minute so I improved the
Matrix
class and found a few speedups: the matrix is symmetric, that means M(x,y) = M(y,x) (M = M^T for those who works more often with matrices)
 therefore I only need to compute M(x,y) for x <= y and then just copy that value to M(y,x)
More or less by chance I printed the final matrix and still saw many numbers M(x,y) = 0.
Fascinated by this observation, I ran the
removeUnreachable
function on the final matrix and it reduced the matrix to only 126x126.Then I aggressively tried to run
removeUnreachable
as often as possible and found: after just one matrix exponentiation the matrix can be shrunk to 126x126, that means M^2 can be reduced
 no further reductions where found for M^3, M^4 or other exponents
Alternative Approaches
The first values of the sequence f can be found in OEIS A028452 along with the generating function which is full of "magic constants".
The generating function could probably be evaluated in milliseconds  in contrast to my solution which almost exceeds the time limit of 1 minute.
The matrix can be further reduced to 23x23 if you write bugfree code properly handling rotations (as well as flipping).
I don't quite understand the reasoning, but even a 19x19 matrix is discussed in the Project Euler forum.
Note
Major parts of the Matrix
class come from my solution of problem 458 (and were heavily optimized for the current problem).
Aside from that this problem has the largest chunk of code originally written for a specific problem.
Interactive test
This feature is not available for the current problem.
My code
… was written in C++11 and can be compiled with G++, Clang++, Visual C++. You can download it, too.
#include <iostream>
#include <map>
#include <set>
#include <vector>
#include <tuple>
// everything modulo 10^9+7 (which is a prime)
const unsigned int Modulo = 100000007;
// different types of cells in a layer
const char Up = 'U';
const char Down = 'D';
const char Horizontal = '';
const char Vertical = '';
const char Empty = ' ';
// a single 3x3 level of the tower
typedef std::array<char, 9> Layer;
// store all different "designs" of completely filled layers, each is assigned a unique ID
std::set<Layer> layers;
// 2^9 different intersections between two layers
const unsigned int NumBorders = 1 << 9;
// remember how often each intersection can be observed
unsigned int borders[NumBorders][NumBorders] = { 0 };
// recursively create all different layers, add them to the "layers" container
void createLayers(Layer current)
{
// 3x3 => 9 cells per layer
// each cell is either 'U', 'D', 'H' or 'V'
// cells which are not processed yet are ' '
// find first empty cell
bool full = true;
unsigned int pos = 0;
for (; pos < 9; pos++)
if (current[pos] == Empty)
{
full = false;
break;
}
// no empty cells ?
if (full)
{
// std::set avoids duplicates
layers.insert(current);
return;
}
// attempt to insert a block crossing two layers
current[pos] = Up;
createLayers(current);
//current[pos] = Empty;
current[pos] = Down;
createLayers(current);
//current[pos] = Empty;
// if the right neighbor is empty, too, then add a horizontal layer
if (pos % 3 != 2 && current[pos+1] == Empty)
{
current[pos] = Horizontal;
current[pos+1] = Horizontal;
createLayers(current);
current[pos+1] = Empty;
//current[pos] = Empty;
}
// and finally add a vertical layer
if (pos < 6 && current[pos+3] == Empty)
{
current[pos] = Vertical;
current[pos+3] = Vertical;
createLayers(current);
//current[pos+3] = Empty;
//current[pos] = Empty;
}
}
// extract upper and lower border of a layer
void addBorders(Layer layer)
{
// bitmasks for the top and bottom border of this layer, 1 means "crossing", 0 means "nope, not crossing"
unsigned int top = 0;
unsigned int bottom = 0;
for (unsigned int i = 0; i < 9; i++)
{
// extending into the layer above ?
if (layer[i] == Up)
top = 1 << i;
// extending into the layer below ?
if (layer[i] == Down)
bottom = 1 << i;
}
borders[bottom][top]++;
}
// count number of combinations of a (partial) tower where the bottom and top layer are known as well as the height
unsigned long long bruteForce(unsigned int maskBottom, unsigned int maskTop, unsigned int height)
{
// degenerated case
if (height == 0)
return 0;
// count how many layers have that special bottom and top mask
if (height == 1)
return borders[maskBottom][maskTop];
// memoize
typedef std::tuple<unsigned int, unsigned int, unsigned int> Id;
static std::map<Id, unsigned long long> cache;
Id id(maskBottom, maskTop, height);
auto lookup = cache.find(id);
if (lookup != cache.end())
return lookup>second;
// divide'n'conquer:
// split the (partial) tower in half
//auto half = height / 2;
// it's slightly more efficient if it's not split 50:50 but one part is a power of two (access memoized data more frequently)
unsigned int powerOfTwo = 1;
while (powerOfTwo * 2 < height)
powerOfTwo *= 2;
auto heightTop = powerOfTwo; // instead of "half"
auto heightBottom = height  heightTop;
unsigned long long result = 0;
const unsigned int NumMasks = 1 << 9;
for (unsigned int middle = 0; middle < NumMasks; middle++)
{
// each "half"tower below the middle can be combined with each "half"tower above the middle
result += bruteForce(maskBottom, middle, heightBottom) *
bruteForce(middle, maskTop, heightTop);
result %= Modulo;
}
cache[id] = result;
return result;
}
// quadratic 2D matrix with powmod, based on my solution of problem 458
template <typename Number>
class Matrix
{
// store all elements as a flat 1D array
std::vector<Number> data;
// number of rows / columns
unsigned int size_;
public:
// set all elements to zero
Matrix(unsigned int size__ = 1) // these underscores are sooooo ugly ...
: data(size__ * size__, 0),
size_(size__)
{ }
// height / width of the quadratic matrix
unsigned int size() const
{
return size_;
}
// access a field (read/write), indices are zerobased
Number& operator()(unsigned int column, unsigned int row)
{
return data[column * size_ + row];
}
// access a field (readonly), indices are zerobased
Number get(unsigned int column, unsigned int row) const
{
return data[column * size_ + row];
}
// multiply two matrices
Matrix operator*(const Matrix& other) const
{
Matrix result(size_); // initially all fields are zero
for (unsigned int i = 0; i < size_; i++)
for (unsigned int j = 0; j < size_; j++)
{
if (other.get(i,j) == 0) // optional, just a performance tweak
continue;
for (unsigned int k = 0; k < size_; k++)
result(i,k) += get(j,k) * other.get(i,j);
}
return result;
}
// multiply two symmetric matrices, with modulo
Matrix multiplySymmetric(const Matrix& other, unsigned int modulo) const
{
Matrix result(size_); // initially all fields are zero
// compute one half (without modulo)
for (unsigned int i = 0; i < size_; i++)
for (unsigned int j = 0; j < size_; j++)
for (unsigned int k = i; k < size_; k++) // start at i instead of zero
result(i,k) += get(j,k) * other.get(i,j);
// copy to the other half of the matrix and compute modulo
for (unsigned int i = 0; i < size_; i++)
{
// along the diagonal
result(i,i) = result.get(i,i) % modulo;
// and everything else
for (unsigned int j = i + 1; j < size_; j++)
result(j,i) = result(i,j) = result(i,j) % modulo;
}
return result;
}
// fast exponentiation with modulo
Matrix powmod(unsigned long long exponent, unsigned int modulo) const
{
// more or less the same concept as powmod from my toolbox (which works on integers instead of matrices)
// optional performance tweak
if (exponent == 1)
return *this;
// start with the identity matrix
Matrix result(size_);
for (unsigned int i = 0; i < size_; i++)
result(i,i) = 1;
bool isIdentity = true;
Matrix base = *this;
while (exponent > 0)
{
// fast exponentation:
// odd exponent ? a^b = a*a^(b1)
if (exponent & 1)
{
// optional optimization: avoid multiplying by the identity matrix
if (isIdentity)
{
result = base;
isIdentity = false;
}
else
{
//result = result * base
result = result.multiplySymmetric(base, modulo);
}
}
// even exponent ? a^b = (a*a)^(b/2)
if (exponent > 1)
{
//base = base * base;
base = base.multiplySymmetric(base, modulo);
}
exponent >>= 1;
}
return result;
}
};
// find all states that can be reached from the initial state 0 and shrink matrix accordingly
Matrix<unsigned long long> removeUnreachable(const Matrix<unsigned long long>& matrix)
{
// collect all reachable states
std::set<unsigned int> reachable;
std::set<unsigned int> todo = { 0 }; // 0 => initial state
while (!todo.empty())
{
// pick next state
auto current = *(todo.begin());
todo.erase(current);
reachable.insert(current);
// put all new states on the todo list
for (unsigned int i = 0; i < matrix.size(); i++)
if (matrix.get(current,i) > 0 &&
reachable.count(i) == 0) // avoid revisited already processed states
todo.insert(i);
}
// size of the new matrix
auto matrixSize = reachable.size();
if (matrixSize == matrix.size())
return matrix; // no optimizations found ?
// create a smaller matrix by skipping unreachable states
Matrix<unsigned long long> smaller(matrixSize);
// x and y will be consecutive while I skip a few state of i and j
unsigned int x = 0;
for (unsigned int i = 0; i < NumBorders; i++)
{
// ignore that column ?
if (reachable.count(i) == 0)
continue;
unsigned int y = 0;
for (unsigned int j = 0; j < NumBorders; j++)
{
// ignore that row ?
if (reachable.count(j) == 0)
continue;
// copy cell
smaller(x,y) = matrix.get(i,j);
y++;
}
x++;
}
// done !
return smaller;
}
int main()
{
// this parameter has two different meanings:
//  in bruteforce mode it represents the height of the tower
//  in "fast" mode the height of the tower is 10^limit
unsigned int limit = 10000;
std::cin >> limit;
// start with an empty layer and search for all distinct ways to fill it
Layer nothing;
for (auto& x : nothing)
x = Empty;
createLayers(nothing);
// => 3940 layers
// extract the 512 distinct borders from these 3940 layers
for (auto layer : layers)
addBorders(layer);
// top and bottom borders must be zero (no blocks "crossing" to the outside)
const unsigned int InitialState = 0;
const unsigned int FinalState = 0;
//#define BRUTE_FORCE
#ifdef BRUTE_FORCE
// note: the height is just 10, not 10^10
std::cout << bruteForce(InitialState, FinalState, limit) << std::endl;
// f(6) % q = 64647289
// f(8) % q = 69563725
// f(10) takes about 40 seconds
return 0;
#endif
// copy to a matrix
auto matrixSize = NumBorders;
Matrix<unsigned long long> matrix(matrixSize);
for (unsigned int i = 0; i < matrixSize; i++)
for (unsigned int j = 0; j < matrixSize; j++)
matrix(i,j) = borders[i][j];
// remove unreachable "garbage" states, only 252 out of 512 states will be left
matrix = removeUnreachable(matrix);
// the first chunk (10^1) ...
matrix = matrix.powmod(10, Modulo);
// further reduction of states to 126
matrix = removeUnreachable(matrix);
// ... and the remaining 10^9999 chunks
auto remaining = limit  1;
const auto AtOnce = 19; // process up to 10^18 per iteration
while (remaining > 0)
{
// find largest number 10^x which still fits in a 64 bit integer
unsigned long long power10 = 1;
for (auto i = 1; i < AtOnce && remaining > 0; i++, remaining)
power10 *= 10;
// thats where 99% of the total execution time is spent ...
matrix = matrix.powmod(power10, Modulo);
}
// number of transitions from state 0 to state 0 (with 10^10000 layers inbetween)
std::cout << matrix(InitialState, FinalState) << std::endl;
return 0;
}
This solution contains 53 empty lines, 88 comments and 7 preprocessor commands.
Benchmark
The correct solution to the original Project Euler problem was found in 45.7 seconds on an Intel® Core™ i72600K CPU @ 3.40GHz.
Peak memory usage was about 6 MByte.
(compiled for x86_64 / Linux, GCC flags: O3 march=native fnoexceptions fnortti std=gnu++11 DORIGINAL
)
See here for a comparison of all solutions.
Note: interactive tests run on a weaker (=slower) computer. Some interactive tests are compiled without DORIGINAL
.
Changelog
December 20, 2017 submitted solution
December 20, 2017 added comments
Difficulty
Project Euler ranks this problem at 50% (out of 100%).
Links
projecteuler.net/thread=324  the best forum on the subject (note: you have to submit the correct solution first)
Code in various languages:
Python github.com/MengGen/ProjectEuler/blob/master/324.py (written by MengGen Tsai)
Python github.com/roosephu/projecteuler/blob/master/324.py (written by Yuping Luo)
C++ github.com/roosephu/projecteuler/blob/master/324.cpp (written by Yuping Luo)
Mathematica github.com/LaurentMazare/ProjectEuler/blob/master/e324.ml (written by Laurent Mazare)
Those links are just an unordered selection of source code I found with a semiautomatic search script on Google/Bing/GitHub/whatever.
You will probably stumble upon better solutions when searching on your own. Maybe not all linked resources produce the correct result and/or exceed time/memory limits.
Heatmap
Please click on a problem's number to open my solution to that problem:
green  solutions solve the original Project Euler problem and have a perfect score of 100% at Hackerrank, too  
yellow  solutions score less than 100% at Hackerrank (but still solve the original problem easily)  
gray  problems are already solved but I haven't published my solution yet  
blue  solutions are relevant for Project Euler only: there wasn't a Hackerrank version of it (at the time I solved it) or it differed too much  
orange  problems are solved but exceed the time limit of one minute or the memory limit of 256 MByte  
red  problems are not solved yet but I wrote a simulation to approximate the result or verified at least the given example  usually I sketched a few ideas, too  
black  problems are solved but access to the solution is blocked for a few days until the next problem is published  
[new]  the flashing problem is the one I solved most recently 
1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25 
26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50 
51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75 
76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99  100 
101  102  103  104  105  106  107  108  109  110  111  112  113  114  115  116  117  118  119  120  121  122  123  124  125 
126  127  128  129  130  131  132  133  134  135  136  137  138  139  140  141  142  143  144  145  146  147  148  149  150 
151  152  153  154  155  156  157  158  159  160  161  162  163  164  165  166  167  168  169  170  171  172  173  174  175 
176  177  178  179  180  181  182  183  184  185  186  187  188  189  190  191  192  193  194  195  196  197  198  199  200 
201  202  203  204  205  206  207  208  209  210  211  212  213  214  215  216  217  218  219  220  221  222  223  224  225 
226  227  228  229  230  231  232  233  234  235  236  237  238  239  240  241  242  243  244  245  246  247  248  249  250 
251  252  253  254  255  256  257  258  259  260  261  262  263  264  265  266  267  268  269  270  271  272  273  274  275 
276  277  278  279  280  281  282  283  284  285  286  287  288  289  290  291  292  293  294  295  296  297  298  299  300 
301  302  303  304  305  306  307  308  309  310  311  312  313  314  315  316  317  318  319  320  321  322  323  324  325 
326  327  328  329  330  331  332  333  334  335  336  337  338  339  340  341  342  343  344  345  346  347  348  349  350 
351  352  353  354  355  356  357  358  359  360  361  362  363  364  365  366  367  368  369  370  371  372  373  374  375 
376  377  378  379  380  381  382  383  384  385  386  387  388  389  390  391  392  393  394  395  396  397  398  399  400 
401  402  403  404  405  406  407  408  409  410  411  412  413  414  415  416  417  418  419  420  421  422  423  424  425 
426  427  428  429  430  431  432  433  434  435  436  437  438  439  440  441  442  443  444  445  446  447  448  449  450 
451  452  453  454  455  456  457  458  459  460  461  462  463  464  465  466  467  468  469  470  471  472  473  474  475 
476  477  478  479  480  481  482  483  484  485  486  487  488  489  490  491  492  493  494  495  496  497  498  499  500 
501  502  503  504  505  506  507  508  509  510  511  512  513  514  515  516  517  518  519  520  521  522  523  524  525 
526  527  528  529  530  531  532  533  534  535  536  537  538  539  540  541  542  543  544  545  546  547  548  549  550 
551  552  553  554  555  556  557  558  559  560  561  562  563  564  565  566  567  568  569  570  571  572  573  574  575 
576  577  578  579  580  581  582  583  584  585  586  587  588  589  590  591  592  593  594  595  596  597  598  599  600 
601  602  603  604  605  606  607  608  609  610  611  612  613  614  615  616  617  618  619  620  621  622  623  624 
I scored 13526 points (out of 15700 possible points, top rank was 17 out of ≈60000 in August 2017) at Hackerrank's Project Euler+.
My username at Project Euler is stephanbrumme while it's stbrumme at Hackerrank.
Look at my progress and performance pages to get more details.
Copyright
I hope you enjoy my code and learn something  or give me feedback how I can improve my solutions.
All of my solutions can be used for any purpose and I am in no way liable for any damages caused.
You can even remove my name and claim it's yours. But then you shall burn in hell.
The problems and most of the problems' images were created by Project Euler.
Thanks for all their endless effort !!!
<< problem 323  BitwiseOR operations on random integers  Rooms of Doom  problem 327 >> 