Processing math: 4%

Monday, May 12, 2025

Guess we'll still never know ...

Now that you’ve determined the values of a and b, let’s analyze the rest of Stephen’s statement. Is it true that losing in five games is “closer to a sweep than a seven-game series”? Let p4 represent the probability that the Celtics sweep the Knicks in four games. And let p7 represent the probability that the series goes to seven games (with either team winning).

Suppose p is randomly and uniformly selected from the interval (a,b), meaning we take it as a given that the most likely outcome is that the Knicks will lose the series in five games. How likely is it that p4 is greater than p7? In other words, how often will it be the case that losing in five games means a sweep is more likely than a seven-game series?

As we saw earlier, for any value of p(35,34), we have p4=p4. Similarly, we have p7=P(C,7p)+P(K,7p)=20p3(1p)3. We can solve for the point at which we have p4(p)=p7(p) and get the cubic equation 20(1p)3p=0, or substituting t=1p we get t3+120t120=0. Using Cardano's formula, given that the discriminant when p=120 and q=120 is Δ=q24+p327=11600+1432000=172700>0, we have the solution t=3140+1301730+31401301730, so the equilibrium point between p4 and p7 occurs at p=1t=13140+1301730314013017300.676582453403. So if p were uniformly samples from U(35,34) then the probability that a sweep is more probable than a seven-game series is p353435=20p1230.510549689351.

Guess we'll never know ...

In a recent First Take episode, co-host Stephen A. Smith said:

I got [the Knicks] losing this in five games, which means they’re closer to a sweep than a seven-game series. That’s how I’m looking at it right now.

Let’s look at the first part of Stephen’s statement, that he believed the Knicks would lose to the Celtics in five games.

Let p represent the probability the Celtics win any given game in the series. You should assume that p is constant (which means there’s no home-court advantage) and that games are independent.

For certain values of p, the likeliest outcome is indeed that the Celtics will win the series in exactly five games. While this probability is always less than 50 percent, this outcome is more likely than the Celtics winning or losing in some other specific number of games. In particular, this range can be specified as a<p<b.

Determine the values of a and b.

Let P(t,np) be the probability of team t{C,K} winning the series in n games, given that the probability of the Celtics winning any individual game is p. Then since team t would have to win the final game of the series, the probability is given by P(t, n \mid p) = \begin{cases}\binom{n-1}{3} p^4 (1-p)^{n-4}, &\text{if $t = \text{C}$;}\\ \binom{n-1}{3} p^{n-4}(1-p)^4, &\text{if $t = \text{K}.$}\end{cases}

We are trying to find the maximal interval (a,b) \subseteq [0,1], such that P(\text{C}, 5 \mid p ) = \max_{t \in \{\text{C}, \text{K}\}, n \in \{4,5,6,7\}} P(t, n \mid p), for all p \in (a,b). Let's just brute force this thing: \begin{align*} P(\text{C}, 5 \mid p) = 4p^4(1-p) \gt P(\text{C}, 4 \mid p) = p^4 &\Rightarrow p \in \left(0,\frac{3}{4}\right) \\ P(\text{C}, 5 \mid p) = 4p^4(1-p) \gt P(\text{C}, 6 \mid p) = 10p^4(1-p)^2 &\Rightarrow p \in \left(\frac{3}{5},1\right) \\ P(\text{C}, 5 \mid p) = 4p^4(1-p) \gt P(\text{C}, 7 \mid p) = 20p^4(1-p)^3 &\Rightarrow p \in \left(1 - \frac{1}{\sqrt{5}},1\right), \end{align*} so since as long as p \gt \frac{1}{2} we have P(\text{C}, n \mid p) \gt P(\text{K}, n \mid p) for each n \in \{4,5,6,7\}, we see that the region where a Celtics win in five games is the most probable occurs when p \in \left(\frac{3}{5}, \frac{3}{4}\right).

Sunday, May 4, 2025

Leeloo Disney, Multi-Pass, Extra Credit

This time around, after you complete the first ride on your Multi Pass, you can reserve a fourth ride at any time slot after your first ride (rather than after your third ride). Similarly, after you have completed your second ride, you can reserve a fifth ride at any time slot after your second, and so on, until there are no available time slots remaining.

As before, the first three rides of the day are equally likely to be in any of the 12 time slots, whereas subsequent rides are equally likely to occur in any remaining available slots for the day.

On average, how many rides can you expect to “Lightning Lane” your way through today at Magic Kingdom?

As we say in the previous Classic answer, I encoded an "extra credit" version of the Lightning Pass program, where now as opposed to only adding new rides after your last scheduled ride, you add another ride uniformly distributed over all of the remaining, not already scheduled time slots. This distribution should be, and indeed turns out to be, much more centrally distributed, though here I was unable to come up with a semi-analytical solution and will just rely on the simulations.

Here again, I used N=1,000,000 simulations and arrived at the following histogram, which gives a mean of m = 6.809379, standard deviation of s = 1.404456, which implies that the true mean 95\% confidence interval is R \in [6.8066, 6.8121]. Therefore, let's comfortably round so that under the Extra Credit Lightning Lane rules, you can expect approximately 6.81 rides.

Leeloo Disney, Multi-Pass

By purchasing “Lightning Lane Multi Pass,” you can reserve three of the many rides in a park, with each ride occurring at a different hourlong time slot. For simplicity, suppose the park you’re visiting (let’s say it’s Magic Kingdom) has 12 such time slots, from 9 a.m. to 9 p.m. So if you have the 3 p.m. time slot for a ride, then you can skip the “standby lane” and board the much shorter “Lightning Lane” at any point between 3 and 4 p.m. Assume you can complete at most one ride within each hourlong time slot.

Once you have completed the first ride on your Multi Pass, you can reserve a fourth ride at any time slot after your third ride. This way, you always have at most three reservations. Similarly, after you have completed your second ride, you can reserve a fifth ride at any time slot after your fourth, and so on, up until you are assigned a ride at the 9 p.m. time slot. That will be your final ride of the day.

Magic Kingdom happens to be very busy at the moment, and so each ride is randomly assigned a valid time slot when you request it. The first three rides of the day are equally likely to be in any of the 12 time slots, whereas subsequent rides are equally likely to occur in any slot after your currently latest scheduled ride.

On average, how many rides can you expect to “Lightning Lane” your way through today at Magic Kingdom?

Let's denote the slots that I get as S \subseteq \{1, 2, 3, \dots, 12 \}. Since after each ride I take, the probability distribution of what additional slots I add to S only depend on m = \max S. Additionally, since we care about the total ending size of S, let's keep track of n = |S|.

Define R(n,m) to be the expected total number of rides I take given that I currently have n slots and the a maximum time slot is m. Since once the 12th time slot is filled, you do not end up adding any more rides, we have R(n,12) = n for n = 3, 4, \dots, 12. Otherwise, we have the following recursion formula R(n,m) = \frac{ \sum_{j=m+1}^{12} R(n+1,j)}{12 - m}, \,\,\,\ m \lt 12.

Using the recursion formula, we see that R(n,11) = R(n+1,12) = n+1, \,\,\,n = 3, 4, \dots, 11. Similarly, we get R(n,10) = \frac{R(n + 1, 11) + R(n + 1, 12)}{2} = \frac{2n+3}{2} = n + \frac{3}{2}, \,\,\,n = 3, 4, \dots, 10. Once more, we get R(n,9) = \frac{R(n+1,10) + R(n+1,11) + R(n+1, 12)}{3} = n + \frac{11}{6}, \,\,\, n = 3, 4, \dots, 9. Another time, we get R(n,8) = \frac{\sum_{i=9}^{12} R(n+1,i)}{4} = n + \frac{25}{12}, \,\,\, n = 3, 4, \dots, 8. Further, we get R(n,7) = \frac{\sum_{i=8}^{12} R(n+1,i)}{5} = n + \frac{137}{60}, \,\,\, n = 3, 4, \dots, 7. Continuing onward, we get \begin{align*} R(n,6) &= n + \frac{49}{20}, \,\,\, n = 3, 4, 5, 6; \\ R(n,5) &= n + \frac{363}{140}, \,\,\, n = 3, 4, 5; \\ R(n,4) &= n + \frac{761}{280}, \,\,\, n = 3, 4; \\ R(n,3) &= n + \frac{7129}{2520}, \,\,\, n = 3.\end{align*}

Let's pause to define p_m as the probability that in your first three randomly assigned timeslots, the maximum time slot is m, That is, p_m = \frac{ \binom{m-1}{2} }{ \binom{12}{3} } = \frac{ (m-1)(m-2) }{ 440 }. So using the law of total expectation, we get that the total average number of rides that you expect to go on using the Lightning Lane Multi Pass is R = \sum_{m=3}^{12} p_m R(3,m) = \sum_{m=3}^{12} \frac{(m-1)(m-2)}{440} R(3,m) = \frac{118361}{27720} \approx 4.2698773\dots

To confirm the calculations here, I put together a small Python code for testing purposes (as well as for working on the Extra Credit problem that will be featured in a later post). Below find the code:

import numpy as np
rng = np.random.default_rng()

def lightning_classic(rng):
    slots = sorted(list(rng.choice(12,size=3, replace=False)))
    last = slots[-1]
    while last < 11:
        new = rng.integers(last+1, high=12)
        slots.append(new)
        last = new
    return slots

def lightning_xc(rng):
    slots = sorted(list(rng.choice(12,size=3, replace=False)))
    for i in range(12):
        if i in slots:
            open_slots = [j for j in range(i+1, 12) if j not in slots] ## figure out which ones are still open
            if open_slots:
                new = open_slots[rng.integers(len(open_slots))] ## get new slot
                slots.append(new)
    return slots

After running the code with N = 1,000,000, we get the following histogram, which gives a mean of m = 4.269231, standard deviation of s = 1.032058, which implies that the true mean 95\% confidence interval is R \in [4.2672, 4.2713].

Sunday, April 27, 2025

More large gaps in Euclid's orchard

The fifth largest pair of adjacent gaps in this range are on either side of what angle up from the x-axis?

If use the same Python script shown earlier in the Classic answer, using the value of k=6 in the Python code, we get that for large enough values of R, the next few largest pair of adjacent gaps occur at the following bearings, respectively: \begin{align*}\theta_2^* &= \tan^{-1} \left(\frac{1}{3} \right) \approx 18.434948823\dots^\circ \\ \theta^*_3 &= \tan^{-1} \left(\frac{2}{3}\right) \approx 33.690067526\dots^\circ \\ \theta^*_4 &= \tan^{-1} \left(\frac{1}{4}\right) \approx 14.036243468\dots^\circ.\end{align*} Through now, we still maintain that these are corresponding to the next highest peaks of Thomae's function at x^*_2 = \frac{3}{4}, x^*_3 = \frac{3}{5} and x^*_4 = \frac{4}{5}, respectively. However, continuing one step further, we see that for sufficiently large values of R, the fifth largest pair of adjacent gaps occurs at \theta^*_5 = \tan^{-1} \left(\frac{3}{4}\right) \approx 36.869897649\dots^\circ, which in fact skips over the next highest Thomae's function value of \tilde{x} = \frac{5}{6} in favor of x^*_5 = \frac{4}{7}. Don't feel too bad for \tilde{x} = \frac{5}{6}, since it is equivalent to the bearing \theta^*_6 = \tan^{-1} \left(\frac{1}{5}\right) \approx 11.309932474\dots^\circ, which is in fact the bearing around which you can find the next largest adjacent pair gaps. Perhaps you could still feel bad for Thomae's function's explanatory powers, though.

Searching for large pairs of adjacent gaps in Euclid's orchard

You are at the point (0, 0) on the coordinate plane. There is a tree at each point with nonnegative integer coordinates, such as (5, 6), (0, 7), and (9, 1). The trees are very thin, so that they only obstruct trees that are directly behind them. For example, the tree at (2, 2) is not visible, as it is obscured by the tree at (1, 1).

Now, you can’t see infinitely far in this forest. You’re not sure exactly how far you can see, but it’s pretty dang far. As you look around, you can make out very narrow angular gaps between the trees. The largest gaps are near the positive x-axis and the positive y-axis. After those, the next largest pair of gaps is on either side of the tree at (1, 1), 45 degrees up from the x-axis.

Consider the view between 0 and 45 degrees up from the x-axis. The next largest pair of adjacent gaps in this range are on either side of what angle up from the x-axis?

First, let's note that our forest is akin to what's known as Euclid's orchard, which assumes that each tree is the three dimensional line segment from (m, n, 0) to (m, n, 1) for (m,n) \in \mathbb{N}^2. Well, it turns out that if you as an observer at (0,0) are viewing this orchard along the line y = x, that either the tree whose base is at (m,n) is blocked by another tree if \textsf{gcd}(m,n) \gt 1, or the height of the tree will appear to be at height \frac{1}{m+n} if \textsf{gcd}(m,n) = 1. Therefore, we can associate any visible tree at (m,n) with the point x = x(m,n) = \frac{m}{m+n} \in (0,1) and then its height will be given by Thomae's function defined by T(x) = \begin{cases} \frac{1}{q}, &\text{if $x = \frac{p}{q}$ for $p \in \mathbb{Z},$ $q \in \mathbb{N}$ and $\textsf{gcd}(p,q) = 1$;}\\ 0, &\text{otherwise.}\end{cases} Since we cannot actually see infinitely far into Euclid's orchard, let's assume that we can only see a tree if its base is at (m,n) \in \mathbb{N}^2 and m^2 + n^2 \leq R^2 for some R \gg 1. Therefore, since R\sqrt{2} = \max \{ x+y \mid x^2 + y^2 \leq R^2 \}, the corresponding approximated Thomae's function is T_R(x) = \begin{cases} \frac{1}{q}, &\text{if $x = \frac{p}{q}$ for $p \in \mathbb{Z},$ $q \in \mathbb{N}$, $\textsf{gcd}(p,q) = 1$ and $q \leq R\sqrt{2}$;}\\ 0, &\text{otherwise.}\end{cases}.

Now Thomae's function is fascinating with lots of pathology (e.g., integrable, nowhere differentiable, continuous at each irrational, discountinuous at each rational, etc.). However, here we are interested in using adjacent discontinuities of the approximate Thomae's function but measuring the distances between them in some completely different way, since ultimately we are measuring the angle between the x-axis and the point (m,n), which has much more to do with \frac{n}{m} than \frac{m}{m+n}. So, though I would love to have this problem yield a deep dive into Thomae's function, instead, let's just brute force throw a computer at this problem and see what comes out, since

We need to just implement the following pseudocode procedure: (1) enumerate all of the visible trees, for instance, \textsf{trees} = \left[ (m,n) \mid n \in \{1, \dots, R\}, m \in \{n, \dots, R \}, \textsf{gcd}(m,n) = 1, m^2 + n^2 \leq R^2 \right]; (2) convert each tree to its bearing (angle above the x-axis), that is \textsf{bearings} = \left[ \textsf{arctan2(n,m)} \mid (m,n) \in \textsf{trees} \right]; (3) order the values of \textsf{bearings} in ascending order and let \textsf{gaps} be the corresponding gaps between adjacent trees; and (4) take an \textsf{argsort} of \textsf{gaps} to pull out the location of the largest gaps.

Below is a Python implementation of this pseudocode:

import numpy as np

def get_trees(R):
    return [ (m, n) for n in range(R) for m in range(n, R) if np.gcd(m,n) == 1 and m*m + n*n <= R*R ]

def get_tree_bearings(R):
    bearings = [ np.arctan2(n, m) for (m, n) in get_trees(R) ]
    return sorted(bearings), np.argsort(bearings)

def get_big_gap_bearings(R,k=7):
    trees = get_trees(R)
    bearings, orders = get_tree_bearings(R)
    gaps = [ x - y for x, y in zip(bearings[1:], bearings[:-1])]
    big_gaps = np.argsort(gaps)[-2*k:]
    return [(trees[orders[i]], trees[orders[i+1]]) for i in big_gaps]

For the Classic problem we notice that for k=2 and various values of R = 10, 20, 50, 100, 200, 500, 1000, that we confirm that the largest gap always is next to the x-axis, followed by the gap nearest to the tree at (1,1), then the next largest pair of adjacent gaps is on either side of the tree at (2,1), that is, with bearing \theta^* = \tan^{-1} \left(\frac{1}{2}\right) \approx 26.565051177\dots^\circ. This interestingly, though still mysteriously since I'm not totally sold on the ability to directly infer much of anything from Thomae's function, is equivalent to the second tallest peak of Thomae's function at x^* = 2/3.

Sunday, April 20, 2025

Extra credit hammer throwing

Instead of playing to 3 points, now the first player to 5 points wins the match.

Good news (again)! You have won the first hole, and now lead 1-0. What is your probability of winning the match?

This game will have the same recurrence relationship, but the initial conditions will be \tilde{w}(s_1, s_2) = \begin{cases} 1, &\text{if $s_1 \geq 5 \gt s_2;$}\\ 0, &\text{if $s_1 \lt 5 \leq s_2.$}\end{cases} Since \tilde{w}(s_1,s_2) = w(s_1-2,s_2-2), we automatically have \tilde{w}(4,4) = \tilde{w}(4,3) = \tilde{w}(3,4) = \tilde{w}(3,3) = \frac{1}{2} along with \tilde{w}(4,2) = \tilde{w}(3,2) = \frac{3}{4} and \tilde{w}(2,3) = \tilde{w}(2,4) = \frac{1}{4}. We also have \tilde{w}(4,1) = \tilde{w}(3,1) = \frac{3}{4}, \tilde{w}(1,4) = \tilde{w}(1,3) = \frac{1}{4}, \tilde{w}(4,0) = \tilde{w}(3,0) = \frac{7}{8}, \tilde{w}(1,1) = \tilde{w}(2,2) = \tilde{w}(1,2) = \tilde{w}(2,1) = \frac{1}{2}, and \tilde{w}(2,0) = \frac{11}{16}.

Finally, we arrive at the probability of winning the match to 5 points if you each are playing optimally as \begin{align*} \tilde{w}(1,0) &= \max_u \min_v \left[ uv \frac{\frac{7}{8} + \frac{1}{2}}{2} + (1-u)(1-v) \frac{\frac{11}{16} + \frac{1}{2}}{2} \right. \\ &\quad\quad\quad\quad\quad\left. + u(1-v) \min \left\{ \frac{\frac{7}{8} + \frac{1}{2}}{2}, \frac{11}{16} \right\} + (1-u)v \max \left\{\frac{\frac{11}{16} + \frac{1}{2}}{2}, \frac{11}{16} \right\} \right] \\ &= \max_u \min_v \left[ \frac{11}{16} uv + \frac{19}{32} (1-u)(1-v) + \frac{11}{16} u(1-v) + \frac{11}{16} (1-u)v \right] \\ &= \max_u \min_v \left[ \frac{19}{32} + \frac{3}{32} u + \frac{3}{32} v - \frac{3}{32} uv \right] = \frac{11}{16},\end{align*} with optimizers u^* = 1, v^* = 0.

Isn't hammer throw was a field sport?

You and your opponent are competing in a golf match. On any given hole you play, each of you has a 50 percent chance of winning the hole (and a zero percent chance of tying). That said, scorekeeping in this match is a little different from the norm.

Each hole is worth 1 point. Before starting each hole, either you or your opponent can “throw the hammer.” When the hammer has been thrown, whoever did not throw the hammer must either accept the hammer or reject it. If they accept the hammer, the hole is worth 2 points. If they reject the hammer, they concede the hole and its 1 point to their opponent. Both players can throw the hammer on as many holes as they wish. (Should both players decide to throw the hammer at the exact same time—something that can’t be planned in advance—the hole is worth 2 points.)

The first player to reach 3 points wins the match. Suppose all players always make rational decisions to maximize their own chances of winning.

Good news! You have won the first hole, and now lead 1-0. What is your probability of winning the match?

Let's define your win probability as the function w : \mathbb{N}^2 \to [0,1], so that w(s_1, s_2) is the probability of winning the match given that the current score is s_1 points for you and s_2 points for your opponent, assuming that you are both playing perfectly rationally, maximizing you own chances of winning. Let u be the indicator of whether or not you throw the hammer when the score is (s_1, s_2). Let v be the indicator of whether or not your opponent throws the hammer when the score is (s_1,s_2). Then we can define the recursion formula \begin{align*} w(s_1,s_2) &= \max_{u \in \{0,1\}} \min_{v \in \{0,1\}} \left[uv \frac{ w(s_1+2, s_2) + w(s_1, s_2 + 2) }{2} \right. \\ &\quad\quad\quad + (1-u)(1-v) \frac{ w(s_1 + 1,s_2) + w(s_1, s_2 + 1)}{2} \\ &\quad\quad\quad\quad + u(1-v) \min \left\{ \frac{w(s_1+2, s_2) + w(s_1, s_2 + 2)}{2}, w(s_1+1,s_2) \right\} \\ &\quad\quad\quad\quad\quad \left.+(1-u)v \max \left\{ \frac{w(s_1+2, s_2) + w(s_1, s_2+2)}{2}, w(s_1, s_2+1) \right\} \right],\end{align*} where the coefficients of u(1-v) and (1-u)v, represent the optimal choice of whether or not to accept or reject the hammer, by your opponent and you, respectively. As boundary conditions, we have w(s_1,s_2) = \begin{cases} 1, &\text{if $s_1 \geq 3 \gt s_2$;}\\ 0, &\text{if $s_1 \lt 3 \leq s_2.$}\end{cases}

In particular, we see that for instance \begin{align*}w(2,2) &= \max_u \min_v \left[ uv \frac{ 1 + 0}{2} + (1-u)(1-v) \frac{1 + 0}{2} + u(1-v) \min \left\{ \frac{1+0}{2}, 1 \right\} + (1-u)v \max \left\{ \frac{1+0}{2}, 0 \right\}\right] \\ &= \max_u \min_v \frac{1}{2} = \frac{1}{2}.\end{align*} We also find that \begin{align*}w(1,2) &= \max_u \min_v \left[ uv \frac{1+0}{2} + (1-u)(1-v) \frac{\frac{1}{2} + 0}{2} + u(1-v) \min \left\{ \frac{1+0}{2}, \frac{1}{2} \right\} + (1-u)v \max \left\{ \frac{1+0}{2}, 0 \right\} \right] \\ &= \max_u \min_v \left[\frac{1}{4} + \frac{1}{4}u + \frac{1}{4}v - \frac{1}{4}uv\right] = \frac{1}{2},\end{align*} with optimizers u^*=1 and v^*=0. Similarly, w(2,1) = \frac{1}{2}, as well.

Working our way backward we see that w(2,0) = \frac{3}{4}, w(1,1) = \frac{1}{2} and w(0,2) = \frac{1}{4}. Therefore, we can finally arrive at the probability of winning given that you and your opponent play optimally and that the current score is 1-0 is \begin{align*}w(1,0) &= \max_u \min_v \left[ uv \frac{1 + \frac{1}{2}}{2} + (1-u)(1-v) \frac{\frac{3}{4} + \frac{1}{2}}{2} + u(1-v) \min \left\{ \frac{1+\frac{1}{2}}{2}, \frac{3}{4} \right\} + (1-u) v \max \left\{ \frac{1 + \frac{1}{2}}{2}, \frac{1}{2} \right\} \right] \\ &= \max_u \min_v \left[ \frac{5}{8} + \frac{1}{8} u + \frac{1}{8} v - \frac{1}{8} uv \right] = \frac{3}{4},\end{align*} with optimizers u^*=1 and v^*=0.

Monday, April 14, 2025

Paying attention to the hints

Dean has three colors of the hibiscus: red, orange, and yellow. He wants to plant them in a straight hedge of shrubs (each of which is one color) so that the order appears somewhat random, but not truly random. More specifically, he wants the following to be true:

  • No two adjacent shrubs have the same color.
  • No ordering of three consecutive shrubs appears more than once in the hedge.

What is the greatest number of shrubs Dean’s hedge can contain?

I should have picked up on the otherwise extraneous information regarding Rivka Lipkovitz's interest in the base-7 Doomsday algorithm to divine that my Ore's theorem base upper bound was not exactly the sharpest bound in the shed. Well, you can't fool me again! I know that since I did indeed like the previous week's Fiddler involved Hamiltonian cycles, that somehow this one will too ....

So let's design Dean's hedge instructions as a directional graph. Let O denotes an orange hibiscus, R a red one, and Y a yellow one. Then a triplet v = (v_1,v_2,v_3) with v_i \in \{O,R,Y\} with v_2 \ne v_1 and v_3 \ne v_2 is one of the possible orderings of three consecutive shrubs that satisfies both of the conditions. Let V = \{ ORO, ORY, OYO, OYR, ROR, ROY, RYO, RYR, YOR, YOY, YRO, YRY \} be the set of vertices. We can draw a directional edge between v_1 and v_2 if the first two characters of v_2 are the last two characters of v_1, so for instance, there is an edge from RYR to YRO but not one that starts at YRO and ends at RYR. Thus all possible edges are E = \{ (v_1, v_2) \mid v_{12} = v_{21}, v_{13} = v_{22} \}.

Sure enough, there is a Hamiltonian cycle on the graph G = (V,E) covering all 12 possible vertices, which gives a maximal Dean-approved hibiscus hedge length of 14 shurbs (count the first three shrubs of the initial vertex, then each subsequent vertex only adds one new shrub so 3 + 11 = 14). One possible instance of this Hamiltonian cylce is given by \begin{align*} ORO \to ROY \to OYO &\to YOR \to ORY \to RYO \to YOY\\ &\to OYR \to YRY \to RYR \to YRO \to ROR \to ORO,\end{align*} which when put altogether gives a hedge denoted as OROYORYOYRYROR.

Monday, April 7, 2025

Striking Ore while digging into Lipkovitz's Hamiltonian puzzle

A teacher is handing out candy to his students. He abides by the following rules:

  • He hands out candy to groups of three students (i.e., “trios”) at a time. Each member of the trio gets one piece of candy.
  • Each unique trio can ask for candy, but that same trio can’t come back for seconds. If students in the trio want more candy, they must return as part of a different trio.
  • When a trio gets candy, the next trio can’t contain any students from that previous trio.

It turns out that every possible trio can get a helping of candy together. What is the smallest class size for which this is possible?

Obviously, I assume that we are not talking about the trivial case of N=3 students, because, ... lame!

So let's assume that there are N \gt 3 students. Then there will be M = \binom{N}{3} = \frac{1}{6}N(N-1)(N-2) possible trios in the class. Let's assume that the set V = \{ v_1, v_2, \dots, v_M \} enumerates all such trios. Based on the teacher's final condition, we can see that if v_i and v_j share a common student, then v_j cannot be the next trio to ask if v_i has just received candy. Let's define a set of edges on the vertex set E = \{ (v_i, v_j) \in V \times V \mid j \gt i, v_i \cap v_j = \emptyset \}, that is, if the trio v_i could immediately follow the trio v_j (or vice versa), then there is an edge between v_i and v_j. We will call vertices v_i and v_j adjacent if there is an edge e \in E connecting v_i and v_j. Let's define the graph \mathcal{G} = (V, E). Note that we define the degree of a vertex, denoted \delta(v), to be the number of vertices w \in V that are adjacent to v, that is, \delta(v) = \# \{ w \in V \mid e = (v,w) \in E \}.

Since it turns out that every possible trio can get a helping of candy together, but based on the second condition each trio can only ask for candy once, then there must be some path that on \mathcal{G} that visits each vertex exactly once. This type of path is known in graph theory as a Hamiltonian path. A graph is called Hamiltonian if it admits a Hamiltonian path starts and ends at adjacent vertices. The Norwegian graph theorist Øystein Ore provided the following sufficient condition for a graph to Hamiltonian based on the degrees of its vertices:

Ore's Theorem: A graph \mathcal{G} = (V, E) with |V| = n is Hamiltonian if for every pair of non-adjacent vertices v, w \in V, \delta(v) + \delta(w) \geq n.

Let's note that for our graph \mathcal{G}_N, since any vertex v is adjacent to any other trio made up of distinct students, we have \delta(v) = \binom{N-3}{3} = \frac{1}{6} (N-3)(N-4)(N-5) = \frac{N^3 -12N^2 + 47N - 60}{6}. So from an application of Ore's theorem, whenever M = \frac{N^3-3N^2+2N}{6} \leq 2 \frac{N^3 -12N^2 +47N - 60}{6} or equivalently, whenever N^3 - 21N^2 +92N -120 \geq 0, then \mathcal{G} is Hamiltonian. Since the real root of the polynomial p(t) = t^3 -21t^2 +92t -120 is at t^* \approx 15.59.... we have that whenever N \geq 16, then every possible trio can have a helping of candy together.

Monday, March 31, 2025

Boosting busts more seeds in bigger tourneys

Instead of four teams, now there are 2^6, or 64, seeded from 1 through 64. The power index of each team is equal to 65 minus that team’s seed.

The teams play in a traditional seeded tournament format. That is, in the first round, the sum of opponents’ seeds is 2^6+1, or 65. If the stronger team always advances, then the sum of opponents’ seeds in the second round is 2^5+1, or 33, and so on.

Once again, the underdog in every match gets a power index boost B, where B is some positive non-integer. Depending on the value of B, different teams will win the tournament. Of the 64 teams, how many can never win, regardless of the value of B?

After setting up the 64 team bracket, we can choose various values for B and then compute the winners directly. Since the outcome will remain the same for all non-integer values in (\lfloor B \rfloor, \lceil B \rceil), we need only select a total of 64 values, e.g., b+0.5, for b = 0, 1, \dots, 63. The summary of the outcomes of the 64 team tournament bracket are in the table below. We note that there are 27 seeds (namely, 7, 13-15, and 25-47) who never win the tournament. Most unlucky out of each of them are 7, 15 and 47, each of which have sets of B of measure 3 for which they will end up the runner up of the tournament, though they will never win it.

B Winning Seed Runner Up
(0,1) 1 2
(1,2) 1 3
(2,3) 3 1
(3,4) 2 1
(4,5) 6 5
(5,6) 5 3
(6,7) 5 3
(7,8) 4 3
(8,9) 12 11
(9,10) 11 9
(10,11) 11 9
(11,12) 10 9
(12,13) 10 9
(13,14) 9 7
(14,15) 9 7
(15,16) 8 7
(16,17) 24 23
(17,18) 23 21
(18,19) 23 21
(19,20) 22 21
(20,21) 22 21
(21,22) 21 19
(22,23) 21 19
(23,24) 20 19
(24,25) 20 19
(25,26) 19 17
(26,27) 19 17
(27,28) 18 17
(28,29) 18 17
(29,30) 17 15
(30,31) 17 15
(31,32) 16 15
(32,33) 48 47
(33,34) 49 47
(34,35) 49 47
(35,36) 50 49
(36,37) 50 49
(37,38) 51 49
(38,39) 51 49
(39,40) 52 51
(40,41) 52 51
(41,42) 53 51
(42,43) 53 51
(43,44) 54 53
(44,45) 54 53
(45,46) 55 53
(46,47) 55 53
(47,48) 56 55
(48,49) 56 55
(49,50) 57 55
(50,51) 57 55
(51,52) 58 57
(52,53) 58 57
(53,54) 59 57
(54,55) 59 57
(55,56) 60 59
(56,57) 60 59
(57,58) 61 59
(58,59) 61 59
(59,60) 62 61
(60,61) 62 61
(61,62) 63 61
(62,63) 63 61
\gt 63 64 63

Boosting will cause the 2 seed to always bust

Once again, there are four teams remaining in a bracket: the 1-seed, the 2-seed, the 3-seed, and the 4-seed. In the first round, the 1-seed faces the 4-seed, while the 2-seed faces the 3-seed. The winners of these two matches then face each other in the regional final.

Also, each team possesses a “power index” equal to 5 minus that team’s seed. In other words:

  • The 1-seed has a power index of 4.
  • The 2-seed has a power index of 3.
  • The 3-seed has a power index of 2.
  • The 4-seed has a power index of 1.

In any given matchup, the team with the greater power index would emerge victorious. However, March Madness fans love to root for the underdog. As a result, the team with the lower power index gets an effective “boost” B, where B is some positive non-integer. For example, B could be 0.5, 133.7, or 2\pi, but not 1 or 42.

As an illustration, consider the matchup between the 2- and 3-seeds. The favored 2-seed has a power index of 3, while the underdog 3-seed has a power index of 2+B. When B is greater than 1, the 3-seed will defeat the 2-seed in an upset.

Depending on the value of B, different teams will win the tournament. Of the four teams, how many can never win, regardless of the value of B?

As shown in the prompt, if B \lt 1 then 2 will beat 3 in the first round. On the other side of the bracket, 1 will beat 4, since 4 \gt B+1 in this case, so in the final round we have 1 beating 2 since again 4 \gt 3 + B when B \lt 1. So, since whenever B \lt 1, 1 will win the championship. On the other hand, whenever B \gt 1, 2 will lose to 3 in the first round. Therefore, 2 will never win the championship.

Whenever B \in (2,3), 3 will beat 1 for the championship, since 2 + B \gt 4. Whenver B \gt 3, 4 will beat 1 in the first round and then go on to beat 3 for the championship. Thus, there are values of B for which 1, 3 and 4 will win the championship. So out of the four remaining teams, only one of them (the 2 seed) will never win.

Sunday, March 16, 2025

An Extra Credit \pi day \pi-cnic in \pi-land

Suppose the island of \pi-land, as described above, has a radius of 1 mile. That is, Diametric Beach has a length of 2 miles. Again, you are picking a random point on the island for a picnic. On average, what will be the expected shortest distance to shore?

Here we want to calculate the average, minimum distance to shore E = \frac{1}{A} \int_{-R}^R \int_0^{\sqrt{R^2 - x^2}} \min \{ d_D (x,y), d_S (x,y) \} \,dy \,dx, where d_D and d_S were defined as above in the Classic Fiddler answer. As we saw in the Classic Fiddler answer, the region where d_D \leq d_S is given by \Omega, so we can break the integral above into two sections E = \frac{1}{A} \int_{-R}^R \int_0^{\frac{R^2 - x^2}{2R}} y \,dy\,dx + \frac{1}{A} \int_{-R}^R \int_{\frac{R^2-x^2}{2R}}^{\sqrt{R^2 - x^2}} \left(R - \sqrt{x^2+y^2}\right) \,dy \,dx := E_1 + E_2. The first integral, E_1, is relatively straightforward to do, \begin{align*}E_1 = \frac{1}{A} \int_{-R}^R \int_0^{\frac{R^2 - x^2}{2R}} y \, dy \,dx &= \frac{4}{\pi R^2} \int_0^R \left( \frac{y^2}{2} \right)_{y=0}^{y= \frac{R^2 - x^2}{2R}} \,dx\\ &= \frac{4}{\pi R^2} \int_0^R \frac{1}{2} \left( \frac{R^2 - x^2 }{2R} \right)^2 \,dx \\ &= \frac{4}{\pi R^2} \int_0^R \frac{1}{8R^2} \left( x^4 - 2R^2 x^2 + R^4 \right) \,dx \\ &= \frac{1}{2\pi R^4} \left[ \frac{x^5}{5} - \frac{2R^2x^3}{3} + R^4 x \right]_{x=0}^{x=R} \\ &= \frac{1}{2\pi R^4} \left( \frac{R^5}{5} - \frac{2R^5}{3} + R^5 \right) \\ &= \frac{R}{2\pi} \left( \frac{1}{5} - \frac{2}{3} + 1 \right) \\ &= \frac{R}{2\pi} \frac{3 - 10 + 15}{15} = \frac{4R}{15 \pi} \end{align*}

For the second integral, it will be easier to switch to polar coordinates, then do some trig-substitutions to get a handle on the resulting integral. The curve y = \frac{R^2 - x^2}{2R} is equivalent to the equation x^2 + 2R y - R^2 = 0 which is given by r^2 \cos^2 \theta + 2R r \sin \theta - R^2 = 0 in polar coordinates. Solving the quadratic in terms of r and choosing the positive solution when \sin \theta \geq 0 since \theta \in [0,\pi], we get \begin{align*}r &= \frac{ -2R \sin \theta + \sqrt{ 4R^2 \sin^2 \theta + 4R^2 \cos^2 \theta} }{2 \cos^2 \theta} \\ &= \frac{-2R \sin \theta + 2R}{2 \cos^2 \theta} \\ &= \frac{ R \left( 1 - \sin \theta \right) }{ \cos^2 \theta } \\ &= \frac{R}{1 + \sin \theta},\end{align*} since \cos^2 \theta = 1 - \sin^2 \theta = (1 - \sin \theta) (1 + \sin \theta). Therefore, we have \begin{align*}E_2 &= \frac{1}{A} \int_{-R}^R \int_{\frac{R^2 - x^2}{2R}}^{\sqrt{R^2 - x^2}} \left( R - \sqrt{x^2 + y^2} \right) \,dy \,dx \\ &= \frac{2}{\pi R^2} \int_0^\pi \int_{\frac{R}{1 + \sin \theta}}^R (R - r) r \,dr \, d\theta \\ &= \frac{2}{\pi R^2} \int_0^\pi \left[ \frac{Rr^2}{2} - \frac{r^3}{3} \right]^{r=R}_{r = \frac{R}{1 + \sin \theta}} \, d\theta \\ &= \frac{2}{\pi R^2} \int_0^\pi \left( \frac{R^3}{2} - \frac{R^3}{3} \right) - \left( \frac{R^3}{2(1 + \sin \theta)^2} - \frac{R^3}{3(1 + \sin \theta)^3} \right) \,d\theta \\ &= \frac{R}{3} - \frac{R}{3\pi} \int_0^\pi \frac{3 \sin \theta + 1}{(1 + \sin \theta)^3} \,d\theta = \frac{R}{3} - I.\end{align*} In order to calculate the last integral I, we need to do a trigonometric half-angle substitution with u = \tan \frac{\theta}{2}. Here we see that du = \frac{1}{2} \sec^2 \frac{\theta}{2} \,d\theta = \frac{1}{2} \left(1 + \tan^2 \frac{\theta}{2} \right) \,d\theta = \frac{1 + u^2}{2} \,d\theta and further that \sin \theta = 2 \sin \frac{\theta}{2} \cos \frac{\theta}{2} = 2 \tan \frac{\theta}{2} \cos^2 \frac{\theta}{2} = \frac{2 \tan \frac{\theta}{2}}{1 + \tan^2 \frac{\theta}{2}} = \frac{2u}{1+u^2}. We additionally see that when \theta = 0, then u = 0, whereas when \theta = \pi, then u = \tan \frac{\pi}{2} = \infty. Therefore we get \begin{align*} I &= \frac{R}{3\pi} \int_0^\pi \frac{3 \sin \theta + 1}{(1 + \sin \theta)^3} \,d\theta \\ &= \frac{2R}{3\pi} \int_0^\infty \frac{ 3 \frac{2u}{1+u^2} + 1 }{\left(1 + \frac{2u}{1 + u^2}\right)^3} \frac{2 \,du}{1 + u^2} \\ &= \frac{2R}{3\pi} \int_0^\infty \frac{ (1 + 6u + u^2) ( 1 + u^2 ) }{ (1 + u)^6 } \, du.\end{align*} Making a further substitution of v = 1 + u, we then get \begin{align*} I = \frac{R}{3\pi} \int_0^\infty \frac{ (1 + 6u + u^2) ( 1 + u^2 ) }{ (1 + u)^6 } \, du &= \frac{2R}{3\pi} \int_1^\infty \frac{ (1 + 6(v-1) + (v-1)^2) ( 1 + (v-1)^2 ) }{v^6} \, dv \\ &= \frac{2R}{3\pi} \int_1^\infty \frac{ (v^2 + 4v - 4)(v^2 - 2v + 2) }{v^6} \,dv \\ &= \frac{2R}{3\pi} \int_1^\infty v^{-2} + 2v^{-3} - 10v^{-4} + 16v^{-5} -8v^{-6} \,dv \\ &= \frac{2R}{3\pi} \left[ -\frac{1}{v} -\frac{1}{v^2} + \frac{10}{3v^3} -\frac{4}{v^4} + \frac{8}{5v^5} \right]_{v=1}^{v=\infty} \\ &= -\frac{2R}{3\pi} \left( -1 -1 + \frac{10}{3} -4 + \frac{8}{5} \right) = \frac{32R}{45\pi} \end{align*} Putting this altogether we get E_2 = \frac{R}{3} - \frac{32R}{45\pi} = \frac{R(15\pi - 32)}{45 \pi}.

Therefore, the average minimal distance to the beach on \pi-land if the picnic spot is uniformly randomly chosen over the area of \pi-land is E = E_1 + E_2 = \frac{4R}{15\pi} + \frac{R(15 \pi - 32)}{45 \pi} = \frac{R(15\pi -20)}{45 \pi} = \frac{R(3\pi - 4)}{9\pi}. So in particular, if R = 1, then we get an average distance to the beach of E = \frac{3 \pi - 4}{9\pi} \approx 0.191862272807\dots.

A \pi day \picnic on \pi-land

You are planning a picnic on the remote tropical island of \pi-land. The island’s shape is a perfect semi-disk with two beaches, as illustrated below: Semicircular Beach (along the northern semicircular edge of the disk) and Diametric Beach (along the southern diameter of the disk).

If you pick a random spot on \pi-land for your picnic, what is the probability that it will be closer to Diametric Beach than to Semicircular Beach? (Unlike the illustrative diagram above, assume the beaches have zero width.)

The local \pi-landers typically measure everything from the midpoint of the Diametric Beach, so let's assume that point is the origin of our xy-plane, with Diametric Beach coinciding with the x-axis. In this case, assuming that \pi-land has a radius of R, then the entire area of \pi-land is A = \frac{1}{2} \pi R^2. At any point (x,y) on the \pi-land the distance to the Diametric Beach is given by d_D (x,y) = y, while the distance to the Semicircular Beach is d_S (x,y) = R - \sqrt{x^2 + y^2}. So the region of \pi-land that is closer to Diametric Beach than to Semicircular Beach is given by \begin{align*} \Omega &= \left\{ (x,y) \in \mathbb{R}^2_+ \mid x^2 + y^2 \leq R^2, d_D(x,y) \leq d_S(x,y) \right\} \\ &= \left\{ (x,y) \in \mathbb{R}_+^2 \mid y \leq R - \sqrt{x^2+ y^2} \right\} \\ &= \left\{ (x,y) \in \mathbb{R}^2_+ \mid x^2 + y^2 \leq (R-y)^2 \right\} \\ &= \left\{ (x,y) \in \mathbb{R}^2_+ \mid x^2 \leq R^2 -2Ry \right\} \\ &= \left\{ (x,y) \in \mathbb{R}^2_+ \mid y \leq \frac{R^2 - x^2}{2R} \right\} \end{align*}

The area of \Omega is given by the integral A_\Omega = \int_{-R}^R \frac{R^2 - x^2}{2R} \,dx = 2 \left[ \frac{Rx}{2} - \frac{x^3}{6R} \right]^{x=R}_{x=0} = 2 \left( \frac{R^2}{2} - \frac{R^2}{6} \right) = \frac{2R^2}{3}. Therefore, the probability of randomly choosing a \pi-land picnic spot closer to Diametric Beach than to Semicircular Beach, that is, in \Omega, is given by p = \frac{A_\Omega}{A} = \frac{ \frac{2R^2}{3} }{ \frac{\pi R^2}{2} } = \frac{4}{3\pi} \approx 0.424413181578....

Sunday, March 9, 2025

Well below average domino placement

You are placing many, many dominoes in a straight line, one at a time. However, each time you place a domino, there is a 1 percent chance that you accidentally tip it over, causing a chain reaction that tips over all dominoes you’ve placed. After a chain reaction, you start over again.

If you do this many, many times, what can you expect the median (note: not the average) number of dominoes placed when a chain reaction occurs (including the domino that causes the chain reaction)?

Let's abstract to using a generic probability of accidentally tipping all of the dominoes over at p\gt 0. Let D_p be the random number representing the total number of dominoes (including the domino that you originally tip over) when the probability of accidentally tipping the dominoes over for any individual domino is p \gt 0, then we have \mathbb{P} \left[ D_p = d \right] = p (1-p)^{d-1}, since in order for you to tip over the dth domino you must first not have knocked over the first (d-1) dominoes, each with a probability of (1-p), and then knock over the dth domino, with probability p.

So for any d \gt 0, the probability of D \leq d is equal to \mathbb{P} \left[ D_p \leq d \right] = \sum_{m=1}^{d} \mathbb{P} \left[ D_p = m \right] = \sum_{m=1}^d p(1-p)^{m-1} = p \cdot \frac{1 - (1-p)^d}{1-(1-p)} = 1 - (1-p)^d. Alternatively, we have \mathbb{P} \left[ D_p \gt d \right] = 1 - \mathbb{P} \left[ D_p \leq d \right] = (1-p)^d. The integer M_p is the median of the distribution of D_p if and only if we have \mathbb{P} \left[ D_p \leq M_p \right] = 1 - (1 - p)^{M_p} \geq \frac{1}{2} and \mathbb{P} \left[ D_p \geq M_p \right] = \mathbb{P} \left[ D_p \gt M_p \right] + \mathbb{P} \left[ D_p = M_p \right] = (1-p)^{M_p-1} \geq \frac{1}{2}. Therefore, combining these two equations we need (1-p)^{M_p} \leq \frac{1}{2} \lt (1-p)^{M_p - 1}. Taking natural logarithms of all sides we get M_p \ln (1-p) \leq -\ln 2 \lt (M_p - 1) \ln(1-p), or equivalently M_p - 1 \lt -\frac{\ln 2}{\ln (1-p)} \leq M_p, or equivalently M_p = \Big\lceil -\frac{\ln 2}{\ln (1-p)} \Big\rceil.

Therefore, if p = 0.01, then we get M_{0.01} = \Big\lceil -\frac{\ln 2}{0.99} \Big\rceil = \lceil 68.9675639365\dots \rceil = 69 is the median number of dominoes that you will have placed when they all get knocked down. For further confirmation we see we get \mathbb{P} \left[ D_p \leq 68 \right] = 0.495114111213 \lt \frac{1}{2} \lt \mathbb{P} \left[ D_p \leq 69 \right] = 0.500162970101.

Though the note said to explicitly ignore the expectation, we see that \begin{align*}E_p = \mathbb{E} \left[ D_p \right] &= \sum_{m=1}^\infty m \mathbb{P} \left[ D_p = m \right]\\ &= \sum_{m=1}^\infty m p (1-p)^{m-1} \\ &= p \sum_{m=1}^\infty m (1-p)^{m-1} \\ &= p \left[ \frac{d}{dt} \left( \frac{1}{1-t} \right) \right|_{t=1-p} \\ &= p \left[ \frac{1}{(1-t)^2} \right|_{t = 1-p} \\ &= p \frac{1}{(1 - (1-p))^2} = p \frac{1}{p^2} = \frac{1}{p}.\end{align*} In particular, we see that for the Classic question, we have E_{0.01} = 100 \gt M_{0.01} = 69.

The Extra Credit question looks at the limit of M_p / E_p = p M_p as p \downarrow 0. Given the formulae above, we see that the limit of the ratio of the median to the average number of dominos placed is given by \lim_{p \to 0} p M_p = \lim_{p \to 0} p \cdot - \frac{\ln 2}{\ln ( 1- p) } = \lim_{p \to 0} \frac{ p\ln 2}{ - \ln (1-p) } = \lim_{p \to 0} \frac{ \ln 2 }{1 - p} = \ln 2, where the first equal sign comes from the squeeze theorem and the second to last equal sign is an application of L'Hôpital's Rule. So not only do we have M_p \lt E_p in the case of p = 0.01, but for all small p \approx 0.

Monday, March 3, 2025

Magical Rabbit Pulling

I have a hat with six small toy rabbits: two are orange, two are green, and two purple. I shuffle the rabbits around and randomly draw them out one at a time without replacement (i.e., once I draw a rabbit out, I never put it back in again).

Your job is to guess the color of each rabbit I draw out. For each guess, you know the history of the rabbits I’ve already drawn. So if we’re down to the final rabbit in the hat, you should be able to predict its color with certainty.

Every time you correctly predict the color of the rabbit I draw, you earn a point. If you play optimally (i.e., to maximize how many points you get), how many points can you expect to earn on average?

Let's define the Markov chain on \mathbb{N}^4 with states X = (g, o, p, s), where g is the number of green rabbits, o is the number orange rabbits, p is the number of purple rabbits and s is the current score. The state X = (g, o, p, s) in this magical rabbit pulling Markov chain is adjacent to the following states: (g-1, o, p, s), (g-1, o, p, s+1), (g, o-1, p, s), (g, o-1, p, s+1), (g, o, p-1, s) and (g,o, p-1, s+1), provided that the resulting quadruple remains in \mathbb{N}^4. The transition probabilities are given by the following: \begin{align*} p( X^{n+1} = (g-1,o,p,s) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{g}{g+o+p} \left( 1 - \frac{1}{\arg\max \{g, o, p\}} \right), & \text{if $g = \max \{ g, o, p \};$}\\ \frac{g}{g+o+p}, &\text{if $g \lt \max \{ g, o, p \}$}\end{cases}\\ p( X^{n+1} = (g-1,o,p,s+1) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{g}{g+o+p} \frac{1}{\arg\max \{g, o, p\}}, & \text{if $g = \max \{ g, o, p \};$}\\ 0, &\text{if $g \lt \max \{ g, o, p \}$}\end{cases} \\ p( X^{n+1} = (g,o-1,p,s) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{o}{g+o+p} \left( 1 - \frac{1}{\arg\max \{g, o, p\}} \right), & \text{if $o = \max \{ g, o, p \};$}\\ \frac{g}{g+o+p}, &\text{if $o \lt \max \{ g, o, p \}$}\end{cases}\\ p( X^{n+1} = (g,o-1,p,s+1) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{o}{g+o+p} \frac{1}{\arg\max \{g, o, p\}}, & \text{if $o = \max \{ g, o, p \};$}\\ 0, &\text{if $o \lt \max \{ g, o, p \}$}\end{cases} \\ p( X^{n+1} = (g,o,p-1,s) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{p}{g+o+p} \left( 1 - \frac{1}{\arg\max \{g, o, p\}} \right), & \text{if $p = \max \{ g, o, p \};$}\\ \frac{g}{g+o+p}, &\text{if $p \lt \max \{ g, o, p \}$}\end{cases}\\ p( X^{n+1} = (g,o,p-1,s+1) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{p}{g+o+p} \frac{1}{\arg\max \{g, o, p\}}, & \text{if $p = \max \{ g, o, p \};$}\\ 0, &\text{if $p \lt \max \{ g, o, p \}$}\end{cases} \end{align*} Here again we are only going to define the probabilities to the extend that these states remain in \mathbb{N}^4. In order to fully specify this as a Markov chain, let's assert that p( X^{n+1} = (0,0,0,s) \mid X^n = (0,0,0,s) ) = 1 for each s \in \mathbb{N} and that all other not explicitly stated transitions have a probability of zero.

Using these transition probabilities we can define the function f : \mathbb{N}^4 \to \mathbb{N} to be the expected total score when there are no more rabbits left in the hat conditional on starting at say X^0 = (g, o, p, s), or for instance we can have f(g, o, p, s) = \mathbb{E} \left[ e_4^T X^\infty \right] where \{ X^n \}_{n \in \mathbb{N}} is the Markov chain specified above. Given the transition probabilities, we can define the boundary conditions f(0,0,0,s) = s, for all s \in \mathbb{N}, with the recursion formula \begin{align*}f(g,o,p,s) &= I ( g \geq 1 ) \left( p( (g,o,p,s) \to (g-1, o, p, s) ) f(g-1, o, p, s) + p( (g, o, p, s) \to (g-1, o, p, s+1) ) f(g-1,o,p,s+1) \right) \\ & \quad + I ( o \geq 1 ) \left( p( (g,o,p,s) \to (g, o-1, p, s) ) f(g, o-1, p, s) + p( (g, o, p, s) \to (g, o-1, p, s+1) ) f(g,o-1,p,s+1) \right) \\ & \quad + I(p \geq 1) \left( p( (g,o,p,s) \to (g, o, p-1, s) ) f(g, o, p-1, s) + p( (g, o, p, s) \to (g, o, p-1, s+1) ) f(g,o,p-1,s+1) \right) \end{align*}

For the case of where we start out g = o = p = 2 rabbits and s =0 initial score, we can use this formula by hand repeatedly and eventually work out that we get the an average score of f(2,2,2,0) = \frac{101}{30} = 3.3666666666666666\dots if playing optimally.

For the case of starting with g=o=p=10 rabbits and s = 0, there are too many cases for many (or at least me) to keep track of, so we can appeal to a quick recursive function with memoization to make it relatively fast. For instance see the quick Python implementation below. This then gives the average score in the Extra Credit case of f(10,10,10,0) = 13.703132371084621\dots if playing optimally.

Sunday, February 23, 2025

Extra Credit LearnedLeague Defensive Efficiency

Now suppose your opponent is equally likely to get one, two, three, four, or five questions correct. As before, you randomly apply the six point values (0, 1, 1, 2, 2, 3) to the six questions. What is the probability that your defensive efficiency will be greater than 50 percent?

In the classic problem we costructed a table for N_2. Here we need to then construct similar tables for N_1, N_3, N_4 and N_5. For i=1, there is exactly one way apiece to score zero or 3 and two ways apiece to score 1 or 2, so we have the following table:

N_1 DE_1 \mathbb{P} \{ N_1 \}
0 1 \frac{1}{6}
1 \frac{2}{3} \frac{1}{3}
2 \frac{1}{3} \frac{1}{3}
3 0 \frac{1}{6}

Similarly, we can construct the other tables as follows:

N_3 DE_3 \mathbb{P} \{ N_3 \}
2 1 \frac{1}{20}
3 \frac{4}{5} \frac{1}{5}
4 \frac{3}{5} \frac{1}{4}
5 \frac{2}{5} \frac{1}{4}
6 \frac{1}{5} \frac{1}{5}
7 0 \frac{1}{20}
N_4 DE_4 \mathbb{P} \{ N_4 \}
4 1 \frac{2}{15}
5 \frac{3}{4} \frac{1}{5}
6 \frac{1}{2} \frac{1}{3}
7 \frac{1}{4} \frac{1}{5}
8 0 \frac{2}{15}
N_5 DE_5 \mathbb{P} \{ N_5 \}
6 1 \frac{1}{6}
7 \frac{2}{3} \frac{1}{3}
8 \frac{1}{3} \frac{1}{3}
9 0 \frac{1}{6}

In particular we see that \mathbb{P} \{ DE_i \gt \frac{1}{2} \} = \begin{cases} \frac{1}{2}, &\text{if $i \in \{1,3,5\}$;}\\ \frac{1}{3}, &\text{if $i \in \{2,4\}$.}\end{cases} So if your opponent answers I questions where I is uniformly distributed on the set \{1, 2,3,4,5\}, then the probability of having a defensive efficiency score greater than 50\% is equal to \begin{align*} \mathbb{P} \{ DE_I \gt \frac{1}{2} \} &= \sum_{i=1}^5 \mathbb{P} \{ DE_i \gt \frac{1}{2} \} \mathbb{P} \{ I = i \} \\ &= \frac{1}{5} \sum_{i=1}^5 \mathbb{P} \{ DE_i \gt \frac{1}{2} \} \\ &= \frac{1}{5} \left( \frac{1}{2} + \frac{1}{3} + \frac{1}{2} + \frac{1}{3} + \frac{1}{2} \right) \\ &= \frac{13}{30} = 43.\bar{3}\%.\end{align*}

LearnedLeague Defensive Efficiency

Every day, you and your opponent for the day are presented with the same six trivia questions. You each do your best to answer these, and you assign point values for your opponent, without knowing (until the following day) which questions your opponent answered correctly. You must assign point values of 0, 1, 1, 2, 2, and 3 to the six questions.

Now, when someone answers three questions correctly, like my opponent just hypothetically did, the fewest points they can earn is 0 + 1 + 1 = 2, while the most points they can earn is 2 + 2 + 3 = 7. The fact that they got 4 points wasn’t great (from my perspective), but wasn’t terrible. In LearnedLeague, my defensive efficiency is defined as the maximum possible points allowed minus actual points allowed, divided by the maximum possible points allowed minus the minimum possible points allowed. Here, that was (7−4)/(7−2), which simplified to 3/5, or 60 percent.

By this definition, defensive efficiency ranges somewhere between 0 and 100 percent. (That is, assuming it’s even defined—which it’s not when your opponent gets zero questions right or all six questions right.)

Suppose you know for a fact that your opponent will get two questions right. However, you have absolutely no idea which two questions these are, and so you randomly apply the six point values to the six questions.

What is the probability that your defensive efficiency for the day will be greater than 50 percent?

For each i = 1, 2, 3, 4, 5, let's N_i be the random score of your opponent, conditioned on getting exactly i answers correct. Given the set of weights we see that \begin{align*}0 \leq & N_1 \leq 3 \\ 1 \leq & N_2 \leq 5 \\ 2 \leq & N_3 \leq 7 \\ 4 \leq & N_4 \leq 8 \\ 6 \leq & N_5 \leq 9,\end{align*} though the distributions are not uniformly distributed. In this case, DE_i(N_i) = \begin{cases} \frac{3 - N_1}{3}, &\text{if $i=1;$}\\ \frac{5 - N_2}{4}, &\text{if $i=2;$}\\ \frac{7-N_3}{5}, &\text{if $i=3;$}\\ \frac{8-N_4}{4}, &\text{if $i=4;$}\\ \frac{9 - N_5}{3}, &\text{if $i=5.$}\end{cases}

The only thing that we need to do is determine the probabilities for each value of i=2.

Since there are two instances of the weight 1, there are two possible outcomes that give a total score of N_2 = 1. Let's call them (0,1_A) and (0, 1_B). Similarly, there are three possible outcomes that give a total score of N_2 = 2, that is (0,2_A), (0,2_B) and (1_A, 1_B). There are five different ways to get an outcome of N_2=3: (0,3), (1_A, 2_A), (1_B, 2_A), (1_A, 2_B), and (1_B, 2_B). There are three was to get N_2=4: (1_A, 3), (1_B, 3) and (2_A, 2_B). Finally, we have two was to get N_2=5: (2_A, 3) and (2_B, 3). There are 15 possible outcomes, so we can construct the following table:

N_2 DE_2 \mathbb{P} \{ N_2 \}
1 1 \frac{2}{15}
2 \frac{3}{4} \frac{1}{5}
3 \frac{1}{2} \frac{1}{3}
4 \frac{1}{4} \frac{1}{5}
5 0 \frac{2}{15}

Based on the table above, we see that if your opponent will answer two questions correctly, then the probability of having a defensive efficiency greater than 50\% is \mathbb{P} \{ DE_2 \gt \frac{1}{2} \} = \mathbb{P} \{ N_2 \leq 2 \} = \frac{2}{15} + \frac{1}{5} = \frac{1}{3}.

Sunday, February 16, 2025

Owner of a squished heart

You can generate a heart shape by drawing a unit square (i.e., a square with side length 1), and then attaching semicircles (each with radius 1/2) to adjacent edges, as shown in the diagram below:

What is the radius of the smallest circle that contains this heart shape?

Let's define our frame of reference. Let's assume that the center of this unit square is the origin and that as in the figure above, the sides of the square make angle of \frac{\pi}{4} with respect to the x- and y-axes. In particular, the lowest corner of the unit square is located at (0, -\frac{\sqrt{2}}{2}). The two semi-circles added on the upper sides of the square are centered at (\pm\frac{\sqrt{2}}{4}, \frac{\sqrt{2}}{4}). From symmetry, we will focus on the one in the first quadrant and parameterize its boundary as \left( \frac{\sqrt{2}}{4} + \frac{1}{2} \cos \left( \theta - \frac{\pi}{4} \right), \frac{\sqrt{2}}{4} + \frac{1}{2} \sin \left(\theta - \frac{\pi}{4}\right) \right), for 0 \leq \theta \leq \pi.

Due to symmetry, any smallest inscribing circle should be centered at some point (0,h) on the positive y-axis. Then we can set up the problem in the following way. Obviously in order for every point in the semi-circle centered at (\frac{\sqrt{2}}{4}, \frac{\sqrt{2}}{4}) inside a circle centered at (0,h) we would need to have the radius be \begin{align*}R(h) &= \max_{\theta \in [0,\pi]} \sqrt{ \left( \frac{\sqrt{2}}{4} + \frac{1}{2} \cos \left( \theta - \frac{\pi}{4} \right) \right)^2 + \left( \frac{\sqrt{2}}{4} - h + \frac{1}{2} \sin \left( \theta - \frac{\pi}{4}\right) \right)^2} \\ &= \max_{\theta \in [0,\pi]} \sqrt{ \frac{1}{8} + \frac{\sqrt{2}}{4} \cos \left( \theta - \frac{\pi}{4} \right) + \frac{1}{4} \cos^2 \left( \theta - \frac{\pi}{4} \right) + \left( \frac{\sqrt{2}}{4} - h \right)^2 + \left( \frac{\sqrt{2}}{4} - h \right) \sin \left( \theta - \frac{\pi}{4} \right) + \frac{1}{4} \sin^2 \left( \theta - \frac{\pi}{4} \right) } \\ &= \max_{\theta \in [0,\pi]} \sqrt{ \frac{3}{8} + \left( \frac{\sqrt{2}}{4} - h \right)^2 + \frac{\sqrt{2}}{4} \cos \left( \theta - \frac{\pi}{4} \right) + \left( \frac{\sqrt{2}}{4} - h \right) \sin \left( \theta - \frac{\pi}{4} \right) }.\end{align*} Since square roots are monotonic and several of the terms are constant with respect to \theta, the optimal value of \theta in to obtain R(h) will be equivalent to the optimizer of the function f_h(\theta) = \frac{\sqrt{2}}{4} \cos \left( \theta - \frac{\pi}{4} \right) + \left( \frac{\sqrt{2}}{4} - h \right) \sin \left( \theta - \frac{\pi}{4} \right) on \theta \in [0,\pi]. Taking the derivative with respect to \theta and setting equal to zero we see that the only critical value of f_h in that interval satisfies -\frac{\sqrt{2}}{4} \sin \left( \theta^* - \frac{\pi}{4} \right) + \left( \frac{\sqrt{2}}{4} - h \right) \cos \left( \theta^* - \frac{\pi}{4} \right) = 0, or equivalently \tan\left(\theta^* - \frac{\pi}{4}\right) = 1 - 2\sqrt{2}h, which implies that \begin{align*}\cos \left(\theta^* -\frac{\pi}{4} \right) &= \frac{1}{\sqrt{1 + (1 - 2\sqrt{2}h)^2}} \\ \sin \left( \theta^* - \frac{\pi}{4} \right) &= \frac{1-2\sqrt{2}h}{\sqrt{1 + (1 - 2\sqrt{2}h)^2}},\end{align*} since \cos ( \tan^{-1} x ) = \frac{1}{\sqrt{1+x^2}} and \sin ( \tan^{-1} x ) = \frac{x}{\sqrt{1+x^2}}. Plugging this back into f_h, we get \begin{align*}f_h(\theta^*) &= \frac{\sqrt{2}}{4} \frac{1}{\sqrt{1 + (1-2\sqrt{2}h)^2}} + \left(\frac{\sqrt{2}}{4} - h\right) \frac{1 - 2\sqrt{2}h}{\sqrt{1 + (1-2\sqrt{2}h)^2}} \\ &= \frac{ \sqrt{2} }{4} \frac{ 1 + (1-2\sqrt{2}h)^2 }{ \sqrt{ 1+ (1-2\sqrt{2}h)^2}} \\ &= \frac{\sqrt{2}}{4} \sqrt{1 + (1 - 2\sqrt{2}h)^2}.\end{align*} Plugging this value back into R(h) we get R(h) = \sqrt{\frac{3}{8} + \frac{1}{8} (1 - 2\sqrt{2}h)^2 + \frac{\sqrt{2}}{4} \sqrt{1 + (1-2\sqrt{2}h)^2}}.

Now in order for the entire unit square to be contained within the circle, the lowest point (0, -\frac{\sqrt{2}}{2}) must also be less than or equal to R(h) away from the center at (0,h). In particular, if we want to have the inscribed within the circle we should have the R(h) equal the distance between (0, -\frac{\sqrt{2}}{2}) and (0,h), that is h + \frac{\sqrt{2}}{2}. This leads to the equation h + \frac{\sqrt{2}}{2} = \sqrt{ \frac{3}{8} + \frac{1}{8} (1- 2\sqrt{2}h)^2 + \frac{\sqrt{2}}{4} \sqrt{ 1 + (1-2\sqrt{2}h)^2}}. Squaring both sides yields the equivalent equation \left(h + \frac{\sqrt{2}}{2}\right)^2 = \frac{3}{8} + \frac{1}{8} \left( 1 - 2\sqrt{2}h \right)^2 + \frac{\sqrt{2}}{4} \sqrt{ 1 + (1 - 2\sqrt{2}h)^2 } which is again equivalent to \frac{3}{2} \sqrt{2} h = \frac{\sqrt{2}}{4} \sqrt{1 + (1-2\sqrt{2}h)^2 }. Squaring both sides again yield \frac{9}{2} h^2 = \frac{1}{8} \left( 1 + (1 - 2\sqrt{2}h)^2 \right) = h^2 - \frac{\sqrt{2}}{2} h + \frac{1}{4} or equivalently \frac{7}{2} h^2 + \frac{\sqrt{2}}{2} h - \frac{1}{4} = 0. Solving the quadratic in terms of h leaves the only positive solution as h^* = \frac{-\frac{\sqrt{2}}{2} + \sqrt{ \frac{1}{2} - 4 \cdot \frac{7}{2} \cdot -\frac{1}{4}}}{2 \frac{7}{2}} = \frac{ -\frac{\sqrt{2}}{2} + \sqrt{ \frac{1}{2} + \frac{7}{2} }}{7} = \frac{ 2 - \frac{\sqrt{2}}{2} }{7} = \frac{4 - \sqrt{2}}{14}. Therefore, the radius of the smallest circle that contains this heart shape is R(h^*) = \frac{4-\sqrt{2}}{14} + \frac{\sqrt{2}}{2} = \frac{4 - \sqrt{2} + 7 \sqrt{2}}{14} = \frac{4 + 6 \sqrt{2}}{14} = \frac{2 + 3\sqrt{2}}{7} \approx 0.89180581\dots.

Sunday, February 9, 2025

Spinning squares and the balls that bounce in them

Suppose you have a unit square that’s rotating about its center at a constant angular speed, and there’s a moving ball inside. The ball has a constant linear speed, and there’s no friction or gravity. When the ball hits an edge of the square, it simply reflects as though the square is momentarily stationary during the briefest of moments they’re in contact. Also, the ball is not allowed to hit a corner of the square—it would get jammed in that corner, a situation we prefer to avoid.

Suppose the ball travels on a periodic (i.e., repeating) path, and that it only ever makes contact with a single point on the unit square. What is the shortest distance the ball could travel in one loop of this path?

Take a unit square that is rotating at a constant angular speed, \omega, where, without loss of generality, we assume that the sides of square are perpendicular to the x- and y-axes at time zero. Further assume that that for any n \geq 2, there is a ball traveling at some fixed velocity v_n in a periodic path with n bounces against the side of the spinning square. Since the point of contact is tracing out a circle of radius \frac{1}{2} and the path of the ball is some n-gon inscribed in that half-unit circle. Since the ball and the point of contact are each traveling at constant speeds, the length of time between collisions and hence length of the sides of the n-gon must all be the same. Thus, a ball tracing a periodic path must trace out a regular n-gon inscribed in the half-unit circle centered at the center of the unit square.

The side length of a regular n-gon inscribed in a half-unit circle can be given by the law of cosines \begin{align*}s_n^2 &= \left(\frac{1}{2}\right)^2 + \left(\frac{1}{2}\right)^2 - 2 \left(\frac{1}{2}\right) \left(\frac{1}{2}\right) \cos \frac{2\pi}{n}\\ &=\frac{1 - \cos \frac{2\pi}{n}}{2}\\ &= \sin^2 \frac{\pi}{n}.\end{align*} Therefore the length of this periodic path is \ell_n = n s_n = n \sin \left( \frac{\pi}{n} \right). We see that since \sin x = x - \frac{x^3}{6} + O(x^5) that \ell_n = \pi - \frac{\pi^3}{6n^2} + O(n^{-4}), so \ell_n \uparrow \pi as n \to \infty. So we can get that the smallest such path is \ell_2 = 2, which represents the path that traces the line segment from the midpoints of the top and bottom sides of the square twice (down and back).

If we were to look for the next longest path, it would be the equilateral triangle of side length \sin \frac{\pi}{3} = \frac{\sqrt{3}}{2}. The total travel distance of this equilateral triangular path is thus \ell_3 = \frac{3\sqrt{3}}{2} \equiv 2.598076211\dots.

Monday, February 3, 2025

Spinning surfaces and the vertical lines that test them

In a more advanced course, you’ve been asked to draw a 3D sketch of the function z = |x| + |y|. As you’re about to do this, you are struck by another bout of dizziness, and your resulting graph is rotated about the origin in a random direction in 3D space. In other words, the central axis of your graph’s “funnel” is equally likely to point from the origin to any point on the surface of the unit sphere.

What is the probability that the resulting graph you produce is in fact a function (i.e., z is a function of x and y)?

At first glance, this looks like a harder problem. There is after all one more direction to dizzily rotate your surfaces. However, first we note that the orthonormal rotation that transforms the point unit z-vector, e_3 = (0,0,1)^T to a random point on the unit sphere \hat{u} = ( \cos \theta \sin \vartheta, \sin \theta \sin \vartheta, \cos \vartheta) is composed of a rotation about the y-axis through an angle \vartheta \in ( 0^\circ, 180^\circ), followed by a rotation about the z-axis through an angle of \theta \in (0^\circ, 360^\circ). Now if we are testing from a vertical line test perspective using a vertical line parallel to the z-axis, then any rotation about the z-axis will not affect its success at passing this test. So we only are really worried about the rotation about the y-axis through an angle of \vartheta. This case then devolves into the two-dimensional case we saw before, though with only half of range. That is, the resulting parametric surface will pass the vertical line test whenever \vartheta \in [0^\circ, 45^\circ) \cup (135^\circ, 180^\circ].

The only thing that we have to be careful here of is assessing the probability. Here we want to measure the surface of the sphere that satisfies \Omega = \{ \vartheta \in [0^\circ, 45^\circ) \cup (135^\circ, 180^\circ], \theta \in [0^\circ, 360^\circ) \}. We can calculate the surface area of the region \Omega by converting to radians and then using the surface area of the shape bounded by the curve r = \sin \vartheta for \vartheta \in [0, \frac{\pi}{4}) \cup (\frac{3\pi}{4}, \pi], that is S = S(\Omega) = 2 \int_0^{\pi / 4} \pi \sin^2 \vartheta \,d\vartheta. Therefore, since the total surface area of the unit sphere is 4\pi, the probability of the rotated parametric surface passing the vertical line test is \begin{align*}q = \frac{S}{4\pi} &= \frac{1}{2} \int_0^{\pi/4} \sin^2 \vartheta \,d\vartheta \\ &= \frac{1}{2} \int_0^{\pi/4} \frac{1 - \cos 2\vartheta}{2} \,d\vartheta \\ &= \left[\frac{\vartheta}{4} - \frac{\sin 2\vartheta}{8} \right]^{\vartheta = \pi/4}_{\vartheta=0} \\ &= \frac{\pi}{16} - \frac{1}{8} = \frac{\pi-2}{16} \approx 7.13495\dots \%.\end{align*}

Spinning graphs and the vertical lines that test them

You’re taking a math exam, and you’ve been asked to draw the graph of a function. That is, your graph must pass the vertical line test, so that no vertical line intersects your function’s graph more than once.

You decide you’re going to graph the absolute value function, y = |x|, and ace the test.

There’s just one problem. You are dealing with a bout of dizziness, and can’t quite make out the x- and y-axes on the exam in front of you. As a result, your function will be rotated about the origin by a random angle that’s uniformly chosen between 0 and 360 degrees.

What is the probability that the resulting graph you produce is in fact a function (i.e., y is a function of x)?

Let's assume that the map of y = |x| is rotated about the origin by an angle of \theta \sim U(0^\circ,360^\circ) degrees with respect to the positive x-axis. We can think of this situation as defining a new set of variables \begin{pmatrix} \tilde{x} \\ \tilde{y} \end{pmatrix} = \begin{pmatrix} \cos \frac{\pi\theta}{180} & -\sin \frac{\pi\theta}{180} \\ \sin \frac{\pi\theta}{180} & \cos \frac{\pi\theta}{180} \end{pmatrix} \begin{pmatrix} x \\ |x| \end{pmatrix} = \begin{pmatrix} x \cos \frac{\pi\theta}{180} - |x| \sin \frac{\pi\theta}{180} \\ x \sin \frac{\pi\theta}{180} + |x| \cos \frac{\pi\theta}{180} \end{pmatrix}. So we have \tilde{x} = f(x) and \tilde{y} = g(x) as parametric functions of our previous variable x.

Now we see that if f is one-to-one, then the parametric curve (f(x), g(x)) will pass the vertical line test, since there is only one possible value of x such that \tilde{x} = f(x), then there can only be at most one intersection between any vertical line and the curve.

On the other hand, if f is not one-to-one, then for some x_1 \ne x_2, we have f(x_1) = f(x_2). In this case, since f(x_1) = f(x_2), we have equivalently (x_1 - x_2) \cos \frac{\pi \theta}{180} = ( |x_1| - |x_2| ) \sin \frac{\pi \theta}{180}. Now let's look at the values of g(x_1) and g(x_2). If \sin \frac{\pi\theta}{180} = 0, then f(x) = x \cos \frac{\pi\theta}{180}, which is a one-to-one function, so if f is not one-to-one then we can assume that \sin \frac{\pi \theta}{180} \ne 0. Therefore, we have \begin{align*}g(x_1) - g(x_2) &= (x_1 - x_2) \sin \frac{\pi \theta}{180} + (|x_1| - |x_2| ) \cos \frac{\pi \theta}{180} \\ &= (x_1 - x_2) \sin \frac{\pi \theta}{180} + (|x_1| - |x_2|) \sin \frac{\pi \theta}{180} \frac{ \cos \frac{\pi \theta}{180}}{\sin \frac{\pi\theta}{180}} \\ & = (x_1 - x_2) \sin \frac{\pi \theta}{180} + (x_1 - x_2) \cos \frac{\pi \theta}{180} \frac{\cos \frac{\pi \theta}{180}}{\sin \frac{\pi \theta}{180}} \\ &= (x_1 - x_2) \left( \sin \frac{\pi \theta}{180} + \frac{ \cos^2 \frac{\pi \theta}{180}}{\sin \frac{\pi \theta}{180}} \right) \\ &= \frac{ x_1 - x_2}{\sin \frac{\pi \theta}{180}},\end{align*} so since x_1 \ne x_2, we have g(x_1) \ne g(x_2). Thus, the vertical line \tilde{x} = f(x_1) = f(x_2) would intersect the curve at \tilde{y}_1 = g(x_1) and \tilde{y}_2 = g(x_2), with \tilde{y}_1 \ne \tilde{y}_2. Thus, if f is not one-to-one then the parametric curve will fail the vertical line test. Therefore, the parametric curve will pass the vertical line test if and only if f is one-to-one.

So let's analyze the function f(x) = x \cos \frac{\pi\theta}{180} - |x| \sin \frac{\pi\theta}{180}. We see that f^\prime(x) = \begin{cases} \cos \frac{\pi\theta}{180} + \sin \frac{\pi\theta}{180}, &\text{if $x \lt 0$;} \\ \cos \frac{\pi\theta}{180} - \sin \frac{\pi\theta}{180}, & \text{if $x \gt 0.$} \end{cases} If f is monotonic, then it is obviously one-to-one, so we need only determine for what values of \theta \in (0^\circ, 360^\circ) does \cos \frac{\pi\theta}{180} - \sin \frac{\pi\theta}{180} and \cos \frac{\pi\theta}{180} + \sin \frac{\pi\theta}{180} have the same signs, that is, when does 0 \lt \left(\cos \frac{\pi\theta}{180} - \sin \frac{\pi\theta}{180}\right) \left( \cos \frac{\pi\theta}{180} + \sin \frac{\pi\theta}{180} \right) = \cos^2 \frac{\pi\theta}{180} - \sin^2 \frac{\pi\theta}{180} = \cos \frac{\pi\theta}{90}. So since \cos x \gt 0 on x \in [ 0, \frac{\pi}{2} ) \cup ( \frac{3\pi}{2}, 2\pi], we see that the resulting graph will pass the vertical line test whenever \theta \in [ 0^\circ, 45^\circ ) \cup (135^\circ, 225^\circ) \cup ( 315^\circ, 360^\circ ], which means that the probability of passing the vertical line test if \theta \sim U(0,360) is p = \frac{|45 - 0| + |225 - 135| + |360 - 315|}{360} = \frac{1}{2}.