Sunday, December 20, 2020

Attrition by subtraction

The Game of Attrition works by players successively subtracting their own remaining number of points from their opponents points until one player is left with a nonpositive point total. Assume player A has $N$ points to begin the game and always goes first against his opponent B.

What is the largest point total that B can start with such that A will still win?

Let's say that B has initial points of $P > N,$ since obviously if $P \leq N,$ then A will win in the first move. After one round of both A and B's moves, A will have $2N-P$ points remaining and B will have $P-N,$ both of which are binomials in terms of $N$ and $P.$

Let $A_n$ and $B_n$ be the point total for A and B, respectively after $n$ full turns. One full turn of gameplay gives the following recurrence:

\begin{align*} A_{n+1} &= 2 A_n - B_n \\ B_{n+1} &= B_n - A_n \\ A_0 = N, \,\,&\,\, B_0 = P\end{align*}

If we assume, as after the first turn, that both $A_n$ and $B_n$ are binomials in terms of $N$ and $P$, say $A_n = a_n N + b_n P$ and $B_n = c_n N + d_n P,$ then the above recurrence becomes \begin{align*} a_{n+1} &= 2 a_n - c_n \\ b_{n+1} &= 2 b_n - d_n \\ c_{n+1} &= c_n - a_n \\ d_{n+1} &= d_n - b_n \\ a_0 = d_0 = 1, \,\,&\,\, b_0 = c_0 = 0 \end{align*} We note that player $A$ will win after $n$ steps if $B_n \leq 0,$ or equivalently, if $$P \leq -\frac{c_n}{d_n} N.$$ The largest possible point total that B can start with such that A will still win is thus $$\sup \left\{ -\frac{c_n}{d_n}N : n \in \mathbb{N} \right\}.$$

Letting $x_n = (a_n, b_n, c_n, d_n)^T,$ and $$M =\left(\begin{array}{cccc} 2 & 0 & -1 & 0 \\ 0 & 2 & 0 & -1 \\ -1 & 0 & 1 & 0 \\ 0 & -1 & 0 & 1 \end{array}\right),$$ we can express the reccurence more compactly in matrix notation as: $$x_n = Mx_{n-1} = \cdots = M^n x_0,$$ where $x_0 = (1, 0, 0, 1)^T.$ The matrix $M$ has eigenvalues $\lambda_\pm = \frac{3 \pm \sqrt{5}}{2},$ each of order $2,$ associated with eigenvectors $$u_\pm = \sqrt{\frac{2}{5\mp \sqrt{5}}}\left( \begin{array}{c} 1 \\ 0 \\ \frac{1 \mp \sqrt{5}}{2} \\ 0\end{array} \right) \,\,\, \text{and} \,\,\, v_\pm = \sqrt{\frac{2}{5\mp \sqrt{5}}} \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \frac{1 \mp \sqrt{5}}{2}\end{array} \right).$$ So let's define $$L = \left(\begin{array}{cccc} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ -\frac{\sqrt{5}-1}{2} & 0 & \frac{\sqrt{5} + 1}{2} & 0 \\ 0 & -\frac{\sqrt{5}-1}{2} & 0 & \frac{\sqrt{5} + 1}{2}\end{array}\right),$$ $$D = \left(\begin{array}{cccc} \frac{5-\sqrt{5}}{2} & 0 & 0 & 0 \\ 0 & \frac{5 - \sqrt{5}}{2} & 0 & 0 \\ 0 & 0 & \frac{5 + \sqrt{5}}{2} & 0 \\ 0 & 0 & 0 & \frac{5 + \sqrt{5}}{2}\end{array}\right) = L^TL$$ and $$\Lambda = \left(\begin{array}{cccc} \frac{3+\sqrt{5}}{2} & 0 & 0 & 0 \\ 0 & \frac{3+\sqrt{5}}{2} & 0 & 0 \\ 0 & 0 & \frac{3-\sqrt{5}}{2} & 0 \\ 0 & 0 & 0 & \frac{3-\sqrt{5}}{2}\end{array}\right).$$ Then the eigendecomposition of $M$ can be given by $M = Q \Lambda Q^T,$ where $Q = LD^{-1/2}.$ In particular, for any integer $n \geq 1,$ $$M^n = Q\Lambda^n Q^T = (LD^{-1/2})\Lambda^n (LD^{-1/2})^T = L D^{-1} \Lambda^n L^T.$$

For any integer $n \geq 1,$ $$D^{-1} \Lambda^n L^T x_0 = \left( \begin{array}{c} \frac{2}{5-\sqrt{5}} \left( \frac{3+\sqrt{5}}{2} \right)^n \\ -\frac{\sqrt{5}}{5} \left( \frac{3+\sqrt{5}}{2} \right)^n \\ \frac{2}{5+\sqrt{5}} \left( \frac{3-\sqrt{5}}{2} \right)^n \\ \frac{\sqrt{5}}{5}\left( \frac{3-\sqrt{5}}{2} \right)^n\end{array} \right).$$ So the recurrence relationship can be written as $x_n = M^n x_0 = L D^{-1} \Lambda^n L^T x_0,$ which is given by \begin{align*}x_n = M^n x_0 &= \left( \begin{array}{cccc} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ -\frac{\sqrt{5}-1}{2} & 0 & \frac{\sqrt{5} + 1}{2} & 0 \\ 0 & -\frac{\sqrt{5}-1}{2} & 0 & \frac{\sqrt{5} + 1}{2}\end{array} \right) \left( \begin{array}{c} \frac{2}{5-\sqrt{5}} \left( \frac{3+\sqrt{5}}{2} \right)^n \\ -\frac{\sqrt{5}}{5} \left( \frac{3+\sqrt{5}}{2} \right)^n \\ \frac{2}{5+\sqrt{5}} \left( \frac{3-\sqrt{5}}{2} \right)^n \\ \frac{\sqrt{5}}{5}\left( \frac{3-\sqrt{5}}{2} \right)^n\end{array} \right)\\ &= \left( \begin{array}{c} \frac{2}{5-\sqrt{5}}\left(\frac{3+\sqrt{5}}{2} \right)^n + \frac{2}{5+\sqrt{5}}\left(\frac{3-\sqrt{5}}{2} \right)^n \\ -\frac{\sqrt{5}}{5} \left(\frac{3+\sqrt{5}}{2} \right)^n + \frac{\sqrt{5}}{5} \left(\frac{3-\sqrt{5}}{2} \right)^n \\ -\frac{\sqrt{5}}{5} \left(\frac{3+\sqrt{5}}{2} \right)^n + \frac{\sqrt{5}}{5} \left(\frac{3-\sqrt{5}}{2} \right)^n \\ \frac{\sqrt{5}(\sqrt{5}-1)}{10} \left(\frac{3 + \sqrt{5}}{2}\right)^n + \frac{\sqrt{5}(\sqrt{5}+1)}{10} \left(\frac{3 - \sqrt{5}}{2}\right)^n \end{array} \right).\end{align*} Therefore, we have $$-\frac{c_n}{d_n} N = \frac{ \frac{\sqrt{5}}{5} \left(\frac{3+\sqrt{5}}{2} \right)^n - \frac{\sqrt{5}}{5} \left(\frac{3-\sqrt{5}}{2} \right)^n }{ \frac{\sqrt{5}(\sqrt{5} - 1)}{10} \left(\frac{3+\sqrt{5}}{2} \right)^n + \frac{\sqrt{5}(\sqrt{5} + 1)}{10} \left(\frac{3-\sqrt{5}}{2} \right)^n } N,$$ which is a monotonically increasing sequence with $$\lim_{n\to \infty} -\frac{c_n}{d_n} N = \sup \left\{ -\frac{c_n}{d_n} N : n \in \mathbb{N}\right\} = \frac{\sqrt{5}+1}{2}N.$$ That is, if $P \leq \frac{\sqrt{5}+1}{2} N,$ then A will still win, or more specifically, the function $$P(N) = \left\lfloor \frac{\sqrt{5}+1}{2} N \right\rfloor$$ gives the maximal value that B can start with but lose to $A$, who starts with N and goes first.

Sunday, November 8, 2020

Putting Santul through his paces

Santul completes two 20 mile training runs with different pace profiles:

1) A constant 9 minute mile pace (which is impressive both physically and for its consistency); and

2) His pace starts at 10 minutes per mile and then linearly decreases to 8 minutes per mile over the course of the race. So for instance, when Santul is halfway done with the race (in time, not distance), his pace is 9 minutes per mile.

The question is: which run did he complete faster and what were the two times?

Santul's 9 minute mile run will complete in $9 \,\text{min} \, / \, \text{mile}\, \cdot 20 \,\text{miles} = 180 \,\text{minutes}.$

Let's now look at the second run. Let $$p(t) = a + bt$$ be the linear function of Santul's pace at time $t$. From the initial data we know that $$p(0) = a = 10$$ and that if $T$ is that time that Santul finishes his second run, then $$p(T) = 10 + b T = 8.$$ So $a = 10$ and $b = -\frac{2}{T}.$

We can convert Santul's pace into distance by integrating as follows $d(s) = \int_0^t \,\frac{ds}{p(s)},$ so given the 20 mile course was finished in $T$, we have $$20 = \int_0^T \,\frac{ds}{10 - \frac{2t}{T}} = -\frac{T}{2} \ln (\frac{8}{10}).$$ Solving for $T$ we get $$T = \frac{40}{\ln 1.25} = 179.256804709...$$ which is just a shade under the constant pace time.

Von Neumann's fair coin simulator

For any $p \in (0,1)$, von Neumann showed that an unfair coin which shows heads with probability $p$ can be used to simulate a fair coin by flipping twice, ignoring/redrawing for HH or TT and associating HT with T and TH with H. The issue is that depending on the value of $p,$ one might need to ignore/redraw for HH and TT outcomes for a very large number of tries before getting either TH or HT and exiting.

For what values of $p \in (0,1)$ is it possible to simulate a fair coin in at most $N$ flips?

While the phrasing on the question is a bit vague, I will assume two main frameworks for what "possible" means:

1) For what values of $p \in (0,1)$ is the expected number of flips needed to exit less than or equal to $N$?; or

2) Given some value of $q \in (0,1),$ for what values of $p \in (0,1)$ does the probability that the simulation exists in less than or equal to $N$ exceed $q$?

Let $T$ be the number of flips of the van Neumann simulator needed to exit. The probability that the van Neumann simulator exits after in one flip is $\mathbb{P}\{T = 1\} = 2p(1-p).$ The probability for $k > 1$ that it exits in $k$ flips is $\mathbb{P}\{ T = k \} = 2p(1-p) \cdot (1 - 2p(1-p))^{k-1}.$ So the average is \begin{align*}\mathbb{E}[T] &= \sum_{k=1}^\infty k \mathbb{P} \{ T = k \} = \sum_{k=1}^\infty k 2p(1-p) (1 - 2p(1-p))^{k-1}\\ &= 2p(1-p) \frac{1}{(1 - (1-2p(1-p)))^2} = \frac{1}{2p(1-p)}.\end{align*}

So under framework 1, the question becomes solving for $p \in (0,1)$ such that $$\mathbb{E}_p [T] = \frac{1}{2p(1-p)} \leq N.$$ Thus the solution is $$p \in \left[\frac{1 - \sqrt{1-2/N}}{2}, \frac{1 + \sqrt{1 - 2/N}}{2} \right].$$

Under framework 2, the question becomes for some $q \in (0,1),$ solving for $p \in (0,1)$ such that $$\mathbb{P}_p \{ T \leq N \} = \sum_{k=1}^N 2p(1-p) (1 - 2p(1-p))^{k-1} = 1 - (1 - 2p(1-p))^N \geq q.$$ Since $2p(1-p) \in [0,0.5]$ for all $p \in (0,1),$ if $q > 1-2^{-N},$ then $\mathbb{P}_p \{T \leq N \} \lt q,$ for all $p \in (0,1).$ But for any $q \in [0, 1-2^{-N}],$ then $\mathbb{P}_p \{T \leq N \} \geq q$ for $$p \in \left[ \frac{1 - \sqrt{2 \sqrt[N]{1-q} - 1}}{2}, \frac{1 + \sqrt{ 2 \sqrt[N]{1-q} - 1}}{2} \right].$$

Sunday, November 1, 2020

A Game of Hot Pumpkin, or How I Was Left Holding the Chinese Remainder

In a game of Hot Pumpkin with $61$ people, the players count off clockwise upward from $1$ to $N$, eliminating the $N$th player and then then beginning again with the player to the left of the most recently eliminated. The integer value of $N$ is mutually agreed upon prior to beginning the game.

I start and say $1$, with the first eliminated player sitting $18$ spots to my left. Then Ricky starts up the next round and the person $31$ spots to her left is eliminated next. Zach both begins and is eliminated in the third round.

Based on the first three eliminations, what is the smallest value of $N$?

This game sets up as an example of the Chinese remainder theorem, which states that the system of congruences involved in this game, namely \begin{align*} N & \equiv 19 \mod 61 \\ N & \equiv 32 \mod 60 \\ N & \equiv 1 \mod 59,\end{align*} has a single unique solution in $\mathbb{Z}_{59 \cdot 60 \cdot 61} = \mathbb{Z}_{215940}.$

For any two congruences, $N \equiv a_1 \mod n_1$ and $N \equiv a_2 \mod n_2,$ with $\text{gcd}(n_1,n_2) = 1,$ we can use Euclidean algorithm to calculate the integer solution to $$m_1 n_1 + m_2 n_1 = 1$$ that is asserted by B\'{e}zout's identity. From here, we note that $$N = a_1 m_2 n_2 + a_2 m_1 n_1 \mod n_1 \cdot n_2 $$ is a solution to both congruences.

So in particular, we see that $(m_1, m_2) = (1, -1)$ is a solution to $61m_1 + 60m_2 = 1$ and hence $$N = 19 \cdot (-1) \cdot 60 + 32 \cdot 1 \cdot 61 \mod 61 \cdot 60 = 812 \mod 3660$$ is a solution to the first two congruences of our game of Hot Pumpkin.

Continuing onward, we can use the extended Euclidean algorithm to find that the solution to $3660m_1 + 59m_2 = 1$ is $(m_1, m_2) = (-29, 1799).$ This gives the smallest possible solution to all three congruences as \begin{align*}N &= 812 \cdot 1799 \cdot 59 + 1 \cdot (-29) \cdot 3660 \mod 3660 \cdot 59\\ &= 86080352 \mod 215940 \equiv 136232.\end{align*}

Sunday, September 13, 2020

Climbing the mountains of Tour de FiveThirtyEight

You and three other, equally talented bicyclists are competing to get up the mountain to accrue points: 5 for first, 3 for second, 2 for third and 1 for last places, respectively.
Since you are all equally talented, the relative rankings of your average speeds for this stage should all be equally likely outcomes. However, two of your competitors are on a team and through a clever use of drafting off of one another will be able to both finish at roughly the same time as the faster of the two teammates.

Given that two of your competitors are working together, what is your expected number of points on this stage?
Let's call your average speed $A,$ the average speeds of the two teammates $B$ and $B^\prime,$ and the average speed of other individual competitor $C.$ Given that there only $24$ possible, equally likely orderings of $A,$ $B,$ $B^\prime,$ and $C,$ we will go with brute force to calculate the expected number of points for this stage.

If $A > \max\{ B, B^\prime \}$, then all of the teammate's schemings are for nought, and you will get the place that you rightfully deserve. There are 6 orderrings with $A > C$ and $A > \max \{B, B^\prime\}$, where you are in first and get $5$ points.
There are another 2 orderrings where $C > A > \max\{B, B^\prime\}$, where you place 2nd and receive $3$ points.

Similarly, if $\min \{B, B^\prime\} > A,$ then the cooperative advantage is also nill from your perspective. These cases cover the 6 orderings where $C > A$ and $\min\{ B, B^\prime\} > A$ where you were in last place and receive $1$ point.

There are also 2 orderings where $\min\{B, B^\prime\} > A > C$ and you finish in 3rd place receiving $2$ points.

The teamwork only affects you when either $B > A > B^\prime$ or $B^\prime > A > B.$ There are four possible orderings with $A$ between $B$ and $B^\prime$ and $A$ the second largest number in $\{A, B, B^\prime, C\}.$ In this case, though rightfully you should have finished in second place the slower of $B$ and $B^\prime$ gets pulled ahead of you by the faster and you place in 3rd, receiving $2$ points.
Similarly, there are 4 possible orderings with $A$ between $B$ and $B^\prime$ and $A$ the third largest number in $\{A, B, B^\prime, C\}.$ In this case, though rightfully you should have finished in third place, the slower of $B$ and $B^\prime$ gets pulled ahead of you by the faster and you place last, receiving $1$ point.

Summing up all of the points, we get $$6 \cdot 5 + 2 \cdot 3 + 6 \cdot 1 + 2 \cdot 2 + 4 \cdot 2 + 4 \cdot 1 = 58 \text{points,}$$ for an average of $2.41\bar{6}$ points.

Meanwhile in the absences of the teammate shenaniganry, your expected points is the simple average of the points $\frac{5+3+2+1}{4} = 2.75,$ so teammates cost you only a third of a point.

Saturday, September 12, 2020

Pickup basketball teams

The number of ballers from each of Blacksburg, Greensboro and Silver Spring who show up to a pickup basketball game each week is identically and independently distributed as the uniform distribution on $\{1, 2, 3, 4, \dots, N\},$ for some integer $N.$

What is the probability that, on any given week, it’s possible to form two equal teams with everyone playing, where two towns are pitted against the third?

Let $B,$ $G$ and $S$ be the number of players who showed up from Blacksburg, Greensboro and Silver Spring, respectively. Without loss of generality, let's assume that $S$ is the largest and we seek the number of configurations where $B+G=S.$

First note that if $S$ is the maximum, then no such configurations can occur if $S=1(=B=G).$ For any $S = 2, \dots, N,$ there are $S-1$ different combinations of integers $B$ and $G,$ such that $B+G=S;$ those being $(B,G) \in \{ (1,S-2), (2, S-3), \dots, (S-3, 2), (S-1, 1) \}.$ So the total number of games where $B+G=S$ is $$\sum_{S=2}^N (S-1) = \frac{N(N-1)}{2}.$$
From symmetry, we get three times this number (since we arbitrarily assumed that $S$ was the maximum, but it could equally likely be $B$ or $G$), so the total number of configurations where two equal teams where two towns are pitted against the third are $3 \frac{N(N-1)}{2}.$ The total number of possible configurations of players to show up on a given week is $N^3,$ so the probability is \[p_N = \frac{3(N-1)}{2N^2}.\]
In particular, if $N = 5,$ then the probability is $p_5 = 24\%.$

Sunday, July 26, 2020

Electoral College Shenanigans

Given the division of Riddler Township into 10 unevenly populated shire, what is the lowest popular vote total that a winning candidate can achieve?
First, let's note that there are a total of $(3 + 4 + \dots + 12) = 75$ electors, so the winning candidate must amass at least $38$ electors.
If shire $k$ has population $P_k$, the lowest popular vote within shire $k$ for a winning candidate is achieved by receiving $\frac{P_k+1}{2}$ votes if shire $k$ is won and $0$ if shire $k$ is lost by that candidate.
Let $x_k \in \{0,1\}$ with $x_k=1$ denoting ``candidate wins shire $k$''. Then the minimal number of votes to achieve the outcome $x = (x_1, \dots, x_{10}) \in \{0,1\}^{10}$ is $$c(x) = \sum_{k=1}^{10} \frac{P_k+1}{2} x_k = \sum_{k=1}^{10} (5k+1) x_k.$$ Additionally, the formula for the number of electors achieved by the outcome $x$ is $E(x) = \sum_{k=1}^{10} (2+k) x_k.$
This electoral problem can be recast as the following knapsack problem:
$$\min \left\{ c(x) = \sum_{k=1}^{10} (5k+1) x_k : E(x) = \sum_{k=1}^{10} (2+k) x_k \geq 38, x \in \{0,1\}^{10} \right\}.$$ Solving this knapsack problem, we see that if a candidate wins $1-$, $2-$, $3-$, $4-$, $7-$, and $9-$shires with the minimal number of votes, then they would receive $38$ electors based on $136$ votes, or just $24.3\%$ of the popular vote.

Sunday, July 19, 2020

The Indianapolis F(t)

Can the Hare Beat the Tortoise? Given his ability to outpace Tortoise by $25\%$, the mathematically minded Hare wants to minimize his margin of victory over his longtime foe. The magical racetrack expands proportionally by 10 miles instantaneously at each minute. Based on the magically expanding track length, Hare wants to know:
How long after the race has begun should Hare wait so that both Tortoise and Hare will cross the finish line at the same exact moment?
First, we will figure out how long it will take Tortoise to finish. To do so, since the total length is dynamic but expands proportionally, we will instead focus on the ratio of the track completed at time $t$. The total track length is $$F(t) = 10 (1 + \lfloor t \rfloor).$$ Thus, while the Tortoise's speed may be fixed at $60$ mph with respect to a fixed vantage point, Tortoise's closing speed with respect to the track length is $$v_T(t) = \frac{ 1 \,\,\text{miles / minute} }{ F(t)\,\, \text{miles / track}} = \frac{1}{10 (1 + \lfloor t \rfloor)}\,\, \text{track / minute}.$$ Tortoise will then finish the track at time $\tau$ such that $$\int_0^\tau v_T(t) \,dt = \sum_{k=0}^{\lfloor \tau \rfloor} \frac{1}{10(k+1)} + \frac{\tau - \lfloor \tau \rfloor}{10 (1 + \lceil\tau \rceil)} = 1.$$ Ignoring the ceilings and floors, gives us roughly $\frac{1}{10} H_{\tau+1} \approx \frac{ \ln (\tau+1) + \gamma }{10} = 1,$ so $\tau \approx \lceil e^{10 - \gamma} \rceil - 1 = 12366$ where $\gamma=0.57721....$ is the Euler-Mascheroni constant. Root solving further gives $\tau = 12365.4681....$

Knowing when Tortoise will complete the track allows our very mathematically inclined Hare to back into when he should start. Hare's relative velocity is $$v_H(t) = \frac{3}{2(1+\lfloor t\rfloor)},$$ if $t \geq t_0$ and $v_H(t) = 0$ if $t \leq t_0.$ So then we need to find $t_0$ such that \begin{align*}\int_0^\tau v_H(t) \, dt &= \int_{t_0}^\tau v_H(t) \,dt \\&= \frac{ 3(\lceil t_0 \rceil - t_0) }{20 (1 + \lfloor t_0 \rfloor)} + \sum_{k=\lceil t_0 \rceil}^{\lfloor \tau \rfloor} \frac{3}{20(k+1)} + \frac{3(\tau - \lfloor \tau \rfloor)}{20(1 + \lceil \tau \rceil)} = 1.\end{align*} If we ignore the first and last terms then we have approximately $\frac{3}{20} \ln \frac{\tau}{t_0} \approx 1,$ which should give $t_0 \approx \tau e^{-20/3} \approx 15.$ Root solving further gives $$\mathbf{t_0 = 15.2416....} \,\, \textbf{minutes}.$$

Sunday, July 5, 2020

The many, many, many stars and a pinch of stripes for good measure

In the "Huh, that's neat ... I guess" category of number theory, the preamble to this week's Riddler Classic notes that the current number of states satisfies that $N$ is double a square and that $N+1$ is a centered pentagonal number.

After $50,$ what is the next integer $N$ with these properties?

After looking up the definition of a centered pentagonal number, it appears that they are of the form $$1 + \sum_{k=1}^n 5k = 1 + 5 \frac{n(n+1)}{2}.$$ So the question states find the minimal $N$ such that there exists integers $n$ and $m$ with \[ N = 2n^2 \,\, \text{and} \,\, N+1 = 5\frac{m(m+1)}{2} + 1.\] In particular, then we would have $$m^2 + m - \frac{4}{5} n^2 = 0$$ for some integers $m, n \in \mathbb{N}.$ Solving for $m$ in terms of $n$ we get $$m = \frac{-1 \pm \sqrt{1 + \frac{16}{5} n^2 }}{2}.$$ In order for $m$ to be an integer we would need $\sqrt{1 + \frac{16}{5}n^2}$ to be an odd integer, so in particular, we require that $5 \mid n.$

So we can brute force check whether the conditions hold for each multiple of $5$, until arriving at $n = 90,$ which gives $\sqrt{1 + \frac{16}{5} n^2} = 161$ and a grand total of $N = 2(90)^2 = 16,200$. This seems like either we would need much bigger flags, or much smaller states to accommodate a flag with those specifications. And at what point on our walk from $51$-ish states to $16,200$ would we reconsider the convention of only having 13 stripes for the original colonies?

Sunday, June 28, 2020

Polly Gawn's Dot Connector

What is the greatest possible number of unique hexagons Polly can draw using six points?

For any 6 vertices $v_1, \dots, v_6$ and any permutation $\sigma \in S_6,$ we can draw an edge from $v_{\sigma_i}$ to $v_{\sigma_{i+1}}$ for $i = 1, \dots, 6.$ Polly would need to check that the edges do not intersect one another (besides at shared vertices) in order to verify that $\sigma$ defines a hexagon.

Line segments $\ell_1 = PQ$ and $\ell_2 = RS$ intersect if and only if $$((Q_2-P_2)(R_1-P_1)-(Q_1-P_1)(R_2-P_2)) ((Q_2-P_2)(S_1-P_1)-(Q_1-P_1)(S_2-P_2)) < 0$$ and $$((S_2-R_2)(P_1-R_1)-(S_1-R_1)(P_2-R_2)) ((S_2-R_2)(Q_1-R_1)-(S_1-R_1)(Q_2-R_2)) < 0.$$
Luckily Polly has chosen a reasonably small number of dots to connect, so that she can brute force the solution. There are $|S_6| = 6! = 720$ possible permutations, though each possible hexagon is described by $12$ such permutations: $6$ labels for vertex with largest $y$-value and two orientations (clockwise and counterclockwise).

Polly spun up the old Monte Carlo simulator and generated $N=10,000$ random sets of $6$ vertices, checked whether each of the $720$ possible permutations defined a hexagon, then divided by $12.$ The maximum number of unique hexagons was 29. This particular configuration involves two nested, but not concentric triangles, e.g., $$v_1 = (3.933833017, 2.610569057), v_2 = (9.739640057, 8.188125177), v_3 = (2.408262249, 7.450428529),$$ $$v_4 = (0.795494503, 9.711307153), v_5 = (4.004793527, 4.769703033), v_6 = (5.950340459, 7.124276426).$$

Saturday, June 13, 2020

R. classicum ...? More like R. catalanii

The R. classicum reproductive behavior leads to some familiar combinatorial behavior involving the Catalan numbers. I'm not totally sure how to get the made up scientific nomenclature of a non-existent bacteria changed ... but I think we should look into it.

First the setup:
R. classicum either splits into two copies (with probability $p$) or dies (with probability $1-p$). If you start with a single R. classicum bacterium, what is the probability that it will lead to an everlasting colony (i.e., the colony will theoretically persist for an infinite amount of time)?

Theorem. The probability that a single R. classicum bacteria leads to an everlasting colony is \[E(p) = \max \left\{ 2 - \frac{1}{p}, 0 \right\}.\] In addition to the colony population at time $t$, $N_t,$ let's additionally catalog the history of a bacteria colony by tracking two quantities: $R_t$ and $D_t$, the number of reproductions/doublings and deaths that have occurred up until time $t.$ For consistency, let's define $R_0=D_0=0.$

Proposition. For each $t \geq 0,$ \[N_t = R_t - D_t + 1.\]
Proof. We will proceed by induction. The base case of $t = 0$ satisfies the equation.
Assume that for some time $t = \tau \geq 0$ that the formula holds, that is, $N_\tau = R_\tau - D_\tau + 1.$
Mechanically we see that $N_{\tau+1}$ is equal to twice the number of bacteria that reproduced at time $t = \tau+1$, so in particular \[R_{\tau+1} = \frac{N_{\tau+1}}{2} + R_\tau.\] Additionally, the remainder of the population at time $t=\tau$ that did not reproduce must have died, we have \[D_{\tau+1} = \left(N_\tau - \frac{N_{\tau+1}}{2}\right) + D_\tau.\] If we subtract the deaths equation from the reproductions, we get \[R_{\tau+1} - D_{\tau+1} = N_{\tau+1} - N_\tau + R_\tau - D_\tau.\] From the inductive hypothesis we have, $N_\tau = R_\tau - D_\tau + 1,$ so we can substitute above and get \[R_{\tau+1} - D_{\tau+1} = N_{\tau+1} - (R_\tau - D_\tau + 1) + R_\tau -D_\tau = N_{\tau+1} - 1,\] or equivalently $N_{\tau+1} = R_{\tau+1} - D_{\tau+1} + 1.$ Thus, by simple induction, the proposition holds for all time steps $t \geq 0.$

The corollary follows immediately.

Corollary. The colony population collapses if and only if there is some step $t > 0$ for which $D_t = R_t + 1.$

Using the corollary, we can then proceed with the proof of the main Theorem.

Proof. Let's calculate $1 - E(p),$ which is the the probability that the R. classicum colony eventually collapses. From the corollary, this means that there is some $t>0$ for which $D_t = R_t + 1.$ We see that the probability of a colony history with, say $k$ reproductions and $k+1$ deaths is $p^k (1-p)^{k+1}.$ Let $C_k$ denote the number of distinct colony histories that end in collapse after $k$ reproductions and $k+1$ deaths.
Then the probability of that the colony eventually collapses is \[1 - E(p) = \sum_{k=0}^\infty C_k p^k (1-p)^{k+1}.\] Now the only thing that remains to show is to find out what $C_k$ is (though I imagine you've already gathered where I'm going with this...). Since a population collapse needs to end it at least one death, the remaining $k$ reproductions and $k$ deaths can be ordered in any way such that the cumulative number of deaths is never more than the cumulative number of reproductions. Thus there is a simple bijection between colony histories with $k$ reproductions and $k+1$ deaths and nonnegative Dyck paths of length $2k$ (for instance set reproductions to ``up'' steps and deaths to ``down'' steps), of which there are \[C_k = \frac{1}{k+1} \binom{2k}{k},\] that is, the $k$th Catalan number, such paths.
Using the generating function for the Catalan numbers, \[c(x) = \sum_{n=0}^\infty C_n x^n = \frac{1 - \sqrt{1-4x}}{2x}\] we get that the probability of eventual colony collapse is \begin{align}1 - E(p) &= \sum_{k=0}^\infty C_k p^k (1-p)^{k+1}\\ &= (1-p) c\Big( p(1-p) \Big)\\ &= (1-p) \frac{ 1 - \sqrt{1 - 4p(1-p)}}{2p(1-p)} \\ &= \frac{1 - \sqrt{1-4p+4p^2}}{2p} = \frac{1 - |1-2p|}{2p}.\end{align} So, for $p > 0.5,$ we have \[E(p) = 1 - \frac{1-|1-2p|}{2p} = 1 - \frac{ 1 - (2p-1) }{2p} = 1 - \frac{2(1-p)}{2p} = 2 - \frac{1}{p};\] whereas for $p \leq 0.5,$ we have \[E(p) = 1 - \frac{1-|1-2p|}{2p} = 1 - \frac{1 - (1-2p)}{2p} = 1 - \frac{2p}{2p} = 0.\] Altogether, the formula $E(p) = \max\{ 2 - \frac{1}{p}, 0 \}$ is proved.

Editor's Note: I actually solved this brute force in Excel by setting up several steps of the process, seeing the when $p=0.8$ that the probability of having a zero population quickly to $0.25$. Varying the input value of $p$ quickly showed that the probability of having a zero population was $\min\{1,(1-p)/p\}$. However, I was convinced that there was a theoretical solution and was pleasantly surprised that it led solidly back to my undergraduate research in enumerative combinatorics with Prof. Ira Gessel at Brandeis University (cf. this sloppy arXiv article).

Editor's Note 2: This seems like a really convoluted way of proving that if the expected population of the colony decreases to zero over time (that is, when $p < 0.5$), that population collapse is almost certain. I suppose it is a roughly serviceable level of convolution for showing almost sure population collapse when the expected value is constant (that is, when $p=0.5$).