Processing math: 66%

Monday, May 26, 2025

Fiddlish space frequencies

Before getting to rivers, let’s figure out where spaces are likely to appear in the (fictional) Fiddlish language, which includes only three- and four-letter words. These words are separated by spaces, but there is no other punctuation. Suppose a line of Fiddlish text is generated such that each next word has a 50 percent chance of being three letters and a 50 percent chance of being four letters.

Suppose a line has many, many, many words. What is the probability that any given character deep into the line is a space?

Let's suppose that we have N1 words in a line. Then there would be N1 spaces, whereas the total length of the line would be L=4F+3T+N1 characters, where F is the number of four-letter words and T is the number of three-letter words. Since F+T=N, we have L=4F+3(NF)+N1=4N+F1. Since F is binomially distributed with frequency 12, the expected value of F is E[F]=N2, so the expected length is E[L]=92N1.

Therefore, the expected frequency of spaces in the line, which is equivalent to the probability that any given character deep into the line is a space, is then p=N1E[L]=N192N129=22.222% as N.

Monday, May 19, 2025

Pyramidal permeation permutations

Consider the following figure, which shows 16 points arranged in a rhombus shape and connected to their neighbors by edges. How many distinct paths are there from the top point to the bottom along the edges such that:

  • You never visit the same point twice.
  • You only move downward or sideways—never upward.

Let's number the nodes from top down and then from left to right, so that the top node is 1, the nodes from left to right on the second level are 2 and 3, and so on. Let N(k) be the number of paths from 1 ending at node k following the rules of never visiting the same point twice and only moving downward or sideways, never upward. For good measure, we can go ahead and assume that N(1)=1.

It is straightforward to see that N(2)=N(3)=2, since either the path goes directly from 1 to k{2,3} or the path first travels to the other node, 5k, and then sideways to k.

Now let's consider some k{4,5,6}. Clearly, for any such k, an acceptable path must last exist the row above at some {2,3}. If the path left the higher row at =2, then it either first goes from 2 to 4 and then (if necessary) sideways from 4 to k, or it goes from 2 to 5 and then (if necessary) sideways from 5 to k. So there must be 2N(2) paths from 1 to k{4,5,6} that leave the first row at =2. A similar argument can be made that there also must be 2N(3) paths from 1 to k{4,5,6} that leave the first row at =3. Therefore, for each k{4,5,6} we must have N(k)=2N(2)+2N(3)=8.

Carrying this argument further, let Li represent the ith row of the diagram. So for instance L0={1}, L1={2,3}, etc. Additionally, let I(k) be the downward index of node k, that is how many acceptable paths downward are available from node k. Then extending the argument from the previous paragraph, we see that N(k)=jLi1I(j)N(j),kLi. So, as previously seen, N(2)=N(3)=2 and N(4)=N(5)=N(6)=8. Furthermore, N(7)=N(8)=N(9)=N(10)=6j=4I(j)N(j)=2N(4)+2N(5)+2N(6)=68=48N(11)=N(12)=N(13)=10j=7I(j)N(j)=N(7)+2N(8)+2N(9)+N(10)=624=288N(14)=N(15)=13j=11I(j)N(j)=N(11)+2N(12)+N(13)=4288=1152N(16)=15j=14I(j)N(j)=21152=2234, so the total number of acceptable paths from the top vertex of the rhombus to the bottom vertex is 2,234.

Monday, May 12, 2025

Guess we'll still never know ...

Now that you’ve determined the values of a and b, let’s analyze the rest of Stephen’s statement. Is it true that losing in five games is “closer to a sweep than a seven-game series”? Let p4 represent the probability that the Celtics sweep the Knicks in four games. And let p7 represent the probability that the series goes to seven games (with either team winning).

Suppose p is randomly and uniformly selected from the interval (a,b), meaning we take it as a given that the most likely outcome is that the Knicks will lose the series in five games. How likely is it that p4 is greater than p7? In other words, how often will it be the case that losing in five games means a sweep is more likely than a seven-game series?

As we saw earlier, for any value of p(35,34), we have p4=p4. Similarly, we have p7=P(C,7p)+P(K,7p)=20p3(1p)3. We can solve for the point at which we have p4(p)=p7(p) and get the cubic equation 20(1p)3p=0, or substituting t=1p we get t3+120t120=0. Using Cardano's formula, given that the discriminant when p=120 and q=120 is Δ=q24+p327=11600+1432000=172700>0, we have the solution t=3140+1301730+31401301730, so the equilibrium point between p4 and p7 occurs at p=1t=13140+1301730314013017300.676582453403. So if p were uniformly samples from U(35,34) then the probability that a sweep is more probable than a seven-game series is p353435=20p1230.510549689351.

Guess we'll never know ...

In a recent First Take episode, co-host Stephen A. Smith said:

I got [the Knicks] losing this in five games, which means they’re closer to a sweep than a seven-game series. That’s how I’m looking at it right now.

Let’s look at the first part of Stephen’s statement, that he believed the Knicks would lose to the Celtics in five games.

Let p represent the probability the Celtics win any given game in the series. You should assume that p is constant (which means there’s no home-court advantage) and that games are independent.

For certain values of p, the likeliest outcome is indeed that the Celtics will win the series in exactly five games. While this probability is always less than 50 percent, this outcome is more likely than the Celtics winning or losing in some other specific number of games. In particular, this range can be specified as a<p<b.

Determine the values of a and b.

Let P(t,np) be the probability of team t{C,K} winning the series in n games, given that the probability of the Celtics winning any individual game is p. Then since team t would have to win the final game of the series, the probability is given by P(t, n \mid p) = \begin{cases}\binom{n-1}{3} p^4 (1-p)^{n-4}, &\text{if $t = \text{C}$;}\\ \binom{n-1}{3} p^{n-4}(1-p)^4, &\text{if $t = \text{K}.$}\end{cases}

We are trying to find the maximal interval (a,b) \subseteq [0,1], such that P(\text{C}, 5 \mid p ) = \max_{t \in \{\text{C}, \text{K}\}, n \in \{4,5,6,7\}} P(t, n \mid p), for all p \in (a,b). Let's just brute force this thing: \begin{align*} P(\text{C}, 5 \mid p) = 4p^4(1-p) \gt P(\text{C}, 4 \mid p) = p^4 &\Rightarrow p \in \left(0,\frac{3}{4}\right) \\ P(\text{C}, 5 \mid p) = 4p^4(1-p) \gt P(\text{C}, 6 \mid p) = 10p^4(1-p)^2 &\Rightarrow p \in \left(\frac{3}{5},1\right) \\ P(\text{C}, 5 \mid p) = 4p^4(1-p) \gt P(\text{C}, 7 \mid p) = 20p^4(1-p)^3 &\Rightarrow p \in \left(1 - \frac{1}{\sqrt{5}},1\right), \end{align*} so since as long as p \gt \frac{1}{2} we have P(\text{C}, n \mid p) \gt P(\text{K}, n \mid p) for each n \in \{4,5,6,7\}, we see that the region where a Celtics win in five games is the most probable occurs when p \in \left(\frac{3}{5}, \frac{3}{4}\right).

Sunday, May 4, 2025

Leeloo Disney, Multi-Pass, Extra Credit

This time around, after you complete the first ride on your Multi Pass, you can reserve a fourth ride at any time slot after your first ride (rather than after your third ride). Similarly, after you have completed your second ride, you can reserve a fifth ride at any time slot after your second, and so on, until there are no available time slots remaining.

As before, the first three rides of the day are equally likely to be in any of the 12 time slots, whereas subsequent rides are equally likely to occur in any remaining available slots for the day.

On average, how many rides can you expect to “Lightning Lane” your way through today at Magic Kingdom?

As we say in the previous Classic answer, I encoded an "extra credit" version of the Lightning Pass program, where now as opposed to only adding new rides after your last scheduled ride, you add another ride uniformly distributed over all of the remaining, not already scheduled time slots. This distribution should be, and indeed turns out to be, much more centrally distributed, though here I was unable to come up with a semi-analytical solution and will just rely on the simulations.

Here again, I used N=1,000,000 simulations and arrived at the following histogram, which gives a mean of m = 6.809379, standard deviation of s = 1.404456, which implies that the true mean 95\% confidence interval is R \in [6.8066, 6.8121]. Therefore, let's comfortably round so that under the Extra Credit Lightning Lane rules, you can expect approximately 6.81 rides.

Leeloo Disney, Multi-Pass

By purchasing “Lightning Lane Multi Pass,” you can reserve three of the many rides in a park, with each ride occurring at a different hourlong time slot. For simplicity, suppose the park you’re visiting (let’s say it’s Magic Kingdom) has 12 such time slots, from 9 a.m. to 9 p.m. So if you have the 3 p.m. time slot for a ride, then you can skip the “standby lane” and board the much shorter “Lightning Lane” at any point between 3 and 4 p.m. Assume you can complete at most one ride within each hourlong time slot.

Once you have completed the first ride on your Multi Pass, you can reserve a fourth ride at any time slot after your third ride. This way, you always have at most three reservations. Similarly, after you have completed your second ride, you can reserve a fifth ride at any time slot after your fourth, and so on, up until you are assigned a ride at the 9 p.m. time slot. That will be your final ride of the day.

Magic Kingdom happens to be very busy at the moment, and so each ride is randomly assigned a valid time slot when you request it. The first three rides of the day are equally likely to be in any of the 12 time slots, whereas subsequent rides are equally likely to occur in any slot after your currently latest scheduled ride.

On average, how many rides can you expect to “Lightning Lane” your way through today at Magic Kingdom?

Let's denote the slots that I get as S \subseteq \{1, 2, 3, \dots, 12 \}. Since after each ride I take, the probability distribution of what additional slots I add to S only depend on m = \max S. Additionally, since we care about the total ending size of S, let's keep track of n = |S|.

Define R(n,m) to be the expected total number of rides I take given that I currently have n slots and the a maximum time slot is m. Since once the 12th time slot is filled, you do not end up adding any more rides, we have R(n,12) = n for n = 3, 4, \dots, 12. Otherwise, we have the following recursion formula R(n,m) = \frac{ \sum_{j=m+1}^{12} R(n+1,j)}{12 - m}, \,\,\,\ m \lt 12.

Using the recursion formula, we see that R(n,11) = R(n+1,12) = n+1, \,\,\,n = 3, 4, \dots, 11. Similarly, we get R(n,10) = \frac{R(n + 1, 11) + R(n + 1, 12)}{2} = \frac{2n+3}{2} = n + \frac{3}{2}, \,\,\,n = 3, 4, \dots, 10. Once more, we get R(n,9) = \frac{R(n+1,10) + R(n+1,11) + R(n+1, 12)}{3} = n + \frac{11}{6}, \,\,\, n = 3, 4, \dots, 9. Another time, we get R(n,8) = \frac{\sum_{i=9}^{12} R(n+1,i)}{4} = n + \frac{25}{12}, \,\,\, n = 3, 4, \dots, 8. Further, we get R(n,7) = \frac{\sum_{i=8}^{12} R(n+1,i)}{5} = n + \frac{137}{60}, \,\,\, n = 3, 4, \dots, 7. Continuing onward, we get \begin{align*} R(n,6) &= n + \frac{49}{20}, \,\,\, n = 3, 4, 5, 6; \\ R(n,5) &= n + \frac{363}{140}, \,\,\, n = 3, 4, 5; \\ R(n,4) &= n + \frac{761}{280}, \,\,\, n = 3, 4; \\ R(n,3) &= n + \frac{7129}{2520}, \,\,\, n = 3.\end{align*}

Let's pause to define p_m as the probability that in your first three randomly assigned timeslots, the maximum time slot is m, That is, p_m = \frac{ \binom{m-1}{2} }{ \binom{12}{3} } = \frac{ (m-1)(m-2) }{ 440 }. So using the law of total expectation, we get that the total average number of rides that you expect to go on using the Lightning Lane Multi Pass is R = \sum_{m=3}^{12} p_m R(3,m) = \sum_{m=3}^{12} \frac{(m-1)(m-2)}{440} R(3,m) = \frac{118361}{27720} \approx 4.2698773\dots

To confirm the calculations here, I put together a small Python code for testing purposes (as well as for working on the Extra Credit problem that will be featured in a later post). Below find the code:

import numpy as np
rng = np.random.default_rng()

def lightning_classic(rng):
    slots = sorted(list(rng.choice(12,size=3, replace=False)))
    last = slots[-1]
    while last < 11:
        new = rng.integers(last+1, high=12)
        slots.append(new)
        last = new
    return slots

def lightning_xc(rng):
    slots = sorted(list(rng.choice(12,size=3, replace=False)))
    for i in range(12):
        if i in slots:
            open_slots = [j for j in range(i+1, 12) if j not in slots] ## figure out which ones are still open
            if open_slots:
                new = open_slots[rng.integers(len(open_slots))] ## get new slot
                slots.append(new)
    return slots

After running the code with N = 1,000,000, we get the following histogram, which gives a mean of m = 4.269231, standard deviation of s = 1.032058, which implies that the true mean 95\% confidence interval is R \in [4.2672, 4.2713].