I have a hat with six small toy rabbits: two are orange, two are green, and two purple. I shuffle the rabbits around and randomly draw them out one at a time without replacement (i.e., once I draw a rabbit out, I never put it back in again).
Your job is to guess the color of each rabbit I draw out. For each guess, you know the history of the rabbits I’ve already drawn. So if we’re down to the final rabbit in the hat, you should be able to predict its color with certainty.
Every time you correctly predict the color of the rabbit I draw, you earn a point. If you play optimally (i.e., to maximize how many points you get), how many points can you expect to earn on average?
Let's define the Markov chain on $\mathbb{N}^4$ with states $X = (g, o, p, s),$ where $g$ is the number of green rabbits, $o$ is the number orange rabbits, $p$ is the number of purple rabbits and $s$ is the current score. The state $X = (g, o, p, s)$ in this magical rabbit pulling Markov chain is adjacent to the following states: $(g-1, o, p, s)$, $(g-1, o, p, s+1)$, $(g, o-1, p, s)$, $(g, o-1, p, s+1)$, $(g, o, p-1, s)$ and $(g,o, p-1, s+1),$ provided that the resulting quadruple remains in $\mathbb{N}^4.$ The transition probabilities are given by the following: \begin{align*} p( X^{n+1} = (g-1,o,p,s) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{g}{g+o+p} \left( 1 - \frac{1}{\arg\max \{g, o, p\}} \right), & \text{if $g = \max \{ g, o, p \};$}\\ \frac{g}{g+o+p}, &\text{if $g \lt \max \{ g, o, p \}$}\end{cases}\\ p( X^{n+1} = (g-1,o,p,s+1) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{g}{g+o+p} \frac{1}{\arg\max \{g, o, p\}}, & \text{if $g = \max \{ g, o, p \};$}\\ 0, &\text{if $g \lt \max \{ g, o, p \}$}\end{cases} \\ p( X^{n+1} = (g,o-1,p,s) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{o}{g+o+p} \left( 1 - \frac{1}{\arg\max \{g, o, p\}} \right), & \text{if $o = \max \{ g, o, p \};$}\\ \frac{g}{g+o+p}, &\text{if $o \lt \max \{ g, o, p \}$}\end{cases}\\ p( X^{n+1} = (g,o-1,p,s+1) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{o}{g+o+p} \frac{1}{\arg\max \{g, o, p\}}, & \text{if $o = \max \{ g, o, p \};$}\\ 0, &\text{if $o \lt \max \{ g, o, p \}$}\end{cases} \\ p( X^{n+1} = (g,o,p-1,s) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{p}{g+o+p} \left( 1 - \frac{1}{\arg\max \{g, o, p\}} \right), & \text{if $p = \max \{ g, o, p \};$}\\ \frac{g}{g+o+p}, &\text{if $p \lt \max \{ g, o, p \}$}\end{cases}\\ p( X^{n+1} = (g,o,p-1,s+1) \mid X^n = (g,o,p,s) ) &= \begin{cases} \frac{p}{g+o+p} \frac{1}{\arg\max \{g, o, p\}}, & \text{if $p = \max \{ g, o, p \};$}\\ 0, &\text{if $p \lt \max \{ g, o, p \}$}\end{cases} \end{align*} Here again we are only going to define the probabilities to the extend that these states remain in $\mathbb{N}^4.$ In order to fully specify this as a Markov chain, let's assert that $p( X^{n+1} = (0,0,0,s) \mid X^n = (0,0,0,s) ) = 1$ for each $s \in \mathbb{N}$ and that all other not explicitly stated transitions have a probability of zero.
Using these transition probabilities we can define the function $f : \mathbb{N}^4 \to \mathbb{N}$ to be the expected total score when there are no more rabbits left in the hat conditional on starting at say $X^0 = (g, o, p, s),$ or for instance we can have $f(g, o, p, s) = \mathbb{E} \left[ e_4^T X^\infty \right]$ where $\{ X^n \}_{n \in \mathbb{N}}$ is the Markov chain specified above. Given the transition probabilities, we can define the boundary conditions $f(0,0,0,s) = s,$ for all $s \in \mathbb{N},$ with the recursion formula \begin{align*}f(g,o,p,s) &= I ( g \geq 1 ) \left( p( (g,o,p,s) \to (g-1, o, p, s) ) f(g-1, o, p, s) + p( (g, o, p, s) \to (g-1, o, p, s+1) ) f(g-1,o,p,s+1) \right) \\ & \quad + I ( o \geq 1 ) \left( p( (g,o,p,s) \to (g, o-1, p, s) ) f(g, o-1, p, s) + p( (g, o, p, s) \to (g, o-1, p, s+1) ) f(g,o-1,p,s+1) \right) \\ & \quad + I(p \geq 1) \left( p( (g,o,p,s) \to (g, o, p-1, s) ) f(g, o, p-1, s) + p( (g, o, p, s) \to (g, o, p-1, s+1) ) f(g,o,p-1,s+1) \right) \end{align*}
For the case of where we start out $g = o = p = 2$ rabbits and $s =0$ initial score, we can use this formula by hand repeatedly and eventually work out that we get the an average score of $f(2,2,2,0) = \frac{101}{30} = 3.3666666666666666\dots$ if playing optimally.
For the case of starting with $g=o=p=10$ rabbits and $s = 0$, there are too many cases for many (or at least me) to keep track of, so we can appeal to a quick recursive function with memoization to make it relatively fast. For instance see the quick Python implementation below. This then gives the average score in the Extra Credit case of $f(10,10,10,0) = 13.703132371084621\dots$ if playing optimally.
cache = {}
def func(g, o, p, s):
if max(g, o, p) == 0:
return s
if (g, o, p, s) in cache.keys():
return cache[(g,o,p,s)]
else:
out = 0
count = sum([1 for i in [g, o, p] if i == max(g, o, p)])
if g > 0:
prob = 1. / count if g == max(g, o, p) else 0
out += func(g-1, o, p, s) * g / float(g + o + p) * (1-prob)
out += func(g-1, o, p, s+1) * g / float(g + o + p) * prob
if o > 0:
prob = 1. / count if o == max(g, o, p) else 0
out += func(g, o-1, p, s) * o / float(g + o + p) * (1-prob)
out += func(g, o-1, p, s+1) * o / float(g + o + p) * prob
if p > 0:
prob = 1. / count if p == max(g, o, p) else 0
out += func(g, o, p-1, s) * p / float(g + o + p) * (1-prob)
out += func(g, o, p-1, s+1) * p / float(g + o + p) * prob
cache[(g,o,p,s)] = out
return out