// ////////////////////////////////////////////////////////
// # Title
// Building a tower
//
// # URL
// https://projecteuler.net/problem=324
// http://euler.stephan-brumme.com/324/
//
// # Problem
// Let `f(n)` represent the number of ways one can fill a `3 * 3 * n` tower with blocks of `2 * 1 * 1`.
// You're allowed to rotate the blocks in any way you like; however, rotations, reflections etc of the tower itself are counted as distinct.
//
// For example (with `q = 100000007`):
// `f(2) = 229`,
// `f(4) = 117805`,
// `f(10) mod q = 96149360`,
// `f(10^3) mod q = 24806056`,
// `f(10^6) mod q = 30808124`.
//
// Find `f(10^10000) mod 100000007`.
//
// # Solved by
// Stephan Brumme
// December 2017
//
// # Algorithm
// The tower consists of `n` levels, each consisting of 3x3 units (I call them ''layer'' throughout my code).
// Each layer is completely filled with blocks, each made of 2 units.
//
// When I look at a single layer in bird eye's view then these blocks can have four orientiations:
// - horizontal
// - vertical
// - up
// - down
//
// Both units of a horizontal or vertical block are located in the same layer:
// || 1 || 1 || 1 ||
// || H ++ H ++ . ||
// || . ++ . ++ V ||
// || . ++ . ++ V ||
//
// "Up" blocks have one unit in the current layer and one unit in the layer above.
// "Down" blocks have one unit in the current layer and one unit in the layer below.
//
// Obviously, the bottom layer of the tower cannot contain any "down" blocks.
// And the top layer of the tower cannot contain any "up" blocks.
//
// My function ''createLayers'' recursively generates all possible layers.
// It fills are 9-character string with all combinations of ''-,|,U,D''.
// When inserting ''-'' (a horizontal block) or ''|'' (a vertical blocks) then it inserts two characters at once.
// The whole procedure is blazingly fast and the container ''layers'' stores all 3940 distinct layers.
//
// Not every layer can be placed on top of each other because "up" and "down" blocks must match.
// Therefore I added the concept of "borders":
// a "border" is a bitmask which is 1 for each unit which belongs to a block that is shared between two layers ("up"/"down" block).
// Each layer has two borders: its top and bottom borders. These borders have 9 bits because each layer has 3x3=9 units.
// Only if the n-th unit is "down" then the n-th bit of the bottom border is set.
// Only if the n-th unit is "up" then the n-th bit of the top border is set.
// That means that there are just `2^9 = 512` distinct borders.
// It's also easy to see that the bottom layer of the tower must be zero (no blocks "extending" into the ground).
// Moreover, the top layer of the tower is zero, too (flat roof).
//
// To verify my idea I wrote a simple ''bruteForce'' algorithm which is based on the concept of divide'n'conquer:
// - the top and bottom layer of the tower must be zero, so start with ''bruteForce(0,0,height)''
// - split the tower into an upper and a lower half (it doesn't have to be a 50:50 split !)
// - every possible border can be found between the upper and the lower half
// - iterate over all 512 borders and recursively call ''bruteForce'' twice: ''bruteForce(bottom,middle,height/2)'' and ''bruteForce(middle,top,height/2)''
// - every upper half is "compatible" to every lower half, so multiply their numbers
// - the number of layers for a tower with ''height = 1'' can be looked up in the ''borders'' container previously generated
//
// That algorithm verified the number of the test cases `f(2)`, `f(4)` and `f(10)`.
// But execution time was slow even with excessive use of memoization.
//
// A few days ago I solved problem 458 with a state machine / Markov chain and a matrix:
// - the number of transitions from border A to border B are already stored in the ''borders'' container
// - that's a 512x512 matrix
// - raise `M` to the `10^10000`-th power
// - since `10^10000` is too large to be done at once, I repeatedly exponentiate the matrix `10^19` times (that number barely fits in 64 bits) until arriving at `10^10000`
//
// There's a tiny problem: raising a 512x512 matrix to the `10^10000`-th power is extremely slow.
// Fast exponentiation helps, but those dimensions are still way too large.
// I ran the code before going to bed and the correct result popped up a few minutes after I woke up in the morning.
//
// I tried to reduce the number of borders (thus shrinking the matrix) but exploiting rotations but somehow messed up and couldn't reproduce the already found correct result.
// No idea where the bug was hidden ... but before I started fixing it I came up another idea:
// - many numbers in the 512x512 matrix are zero
// - after matrix exponentiation, the final result will be at (0,0) because the top and bottom borders must be zero
// - what if some parts of the matrix don't affect the value at (0,0) at all ?
//
// So I wrote the ''removeUnreachable'' function: starting at (0,0) it checks which cells of the matrix contribute to (0,0) directly and indirectly.
// It's a simple path-finding algorithm: is there a route between (0,0) and (x,y) where each intermediate step at (a,b) is not zero ?
// It turns out only 252 out 512 borders can be found in the tower at all, that's a reduction of about 50%.
// Since matrix multiplication is a `O(n^3)` algorithm, I cut down execution time by factor 8.
//
// My program still needed more than a minute so I improved the ''Matrix'' class and found a few speed-ups:
// - the matrix is symmetric, that means `M(x,y) = M(y,x)` (`M = M^T` for those who works more often with matrices)
// - therefore I only need to compute `M(x,y)` for `x <= y` and then just copy that value to `M(y,x)`
//
// ==> down to 6 minutes
//
// More or less by chance I printed the final matrix and still saw many numbers `M(x,y) = 0`.
// Fascinated by this observation, I ran the ''removeUnreachable'' function on the final matrix and it reduced the matrix to only 126x126.
// Then I aggressively tried to run ''removeUnreachable'' as often as possible and found:
// - after just one matrix exponentiation the matrix can be shrunk to 126x126, that means `M^2` can be reduced
// - no further reductions where found for `M^3`, `M^4` or other exponents
//
// Solving `f(10^10000)` - based on the reduced 126x126 matrix - finally takes less than a minute and is about as fast as the brute-code algorithm for `f(10)`.
//
// # Alternative
// The first values of the sequence `f` can be found in OEIS A028452 along with the generating function which is full of "magic constants".
// The generating function could probably be evaluated in milliseconds - in contrast to my solution which almost exceeds the time limit of 1 minute.
//
// The matrix can be further reduced to 23x23 if you write bug-free code properly handling rotations (as well as flipping).
// I don't quite understand the reasoning, but even a 19x19 matrix is discussed in the Project Euler forum.
//
// # Note
// Major parts of the ''Matrix'' class come from my solution of problem 458 (and were heavily optimized for the current problem).
// Aside from that this problem has [the largest chunk](/performance/#codemetrics) of code originally written for a specific problem.
#include
#include