I am teaching an undergraduate course this semester about the design and analysis of experiments, and a major portion of my syllabus entails giving the students an appreciation of and ability to use the awesome mechanics supplied by one of my favorite subjects: probability theory. I have always wanted to teach my own probability class, and this was my first chance. So I have packed it full of fun games, thought experiments, and surprising results. I think it is always nice to remind myself of the utter hopelessness of my feeble mind to intuit likelihoods with only my gut instinct.

Anyway, here is a game that I have only seen discussed briefly in a limited form, and have yet to find its formal treatment (though I have not looked too hard because I don’t know for what to search). Have a friend (or enemy) take 3 slips of paper. write a different number on each (anything from \(-\infty\) to \(\infty\), exclusive), and mix them in a stove top hat. Now you draw the first slip. You can declare it to be the largest number of the three, or discard it and draw another. Once you discard a number though, you cannot return to it. In this way, you must find the largest number to win.

With 3 slips of paper, it is elementary to show that the strategy maximizing your odds of winning is to always toss the first number you draw, and pick the next largest. This makes the probability of winning 0.5. When you have instead 4 slips of paper, it is easy to show that always tossing the first number and picking the next largest is also the best strategy, with a probability of winning 0.46. And with 5 slips of paper, having 120 possible different orderings, it becomes cumbersome but not difficult to show always tossing the first two and picking the next largest is the best strategy with a probability of winning 0.433.

But what about 10 slips of paper? 100? 1000? The number of possible orderings quickly becomes ridiculous, e.g., we can order 10 different numbers in over 3.6 million ways. So, how many of those ways do you win if you always toss the first number and pick the next largest? For the extreme cases of tossing 1 or all but one, it is not too difficult to compute the number of orderings in which one wins. For the other strategies, it quickly descends a treacherous path. As my feeble mind can’t see any easy way to get a closed-form relationship between the probability of winning and the number of N slips I toss, I think this problem provides a perfect opportunity to teach Monte Carlo methods.

Writing the program to simulate this game is actually not difficult; but making it efficient takes some time. Below we see the results from 50,000 trials with 10 numbers, which represents a sampling of a little under 1.4% of the possible orderings. The maximum probability of winning appears to be achieved when I discard the first 3 slips, and pick the next highest. Given that the highest number is not among those I discard, I win about 57% of the time.

For 50 numbers, from 50,000 trials (or 1.6440e-58% of the sample space!!!) I find that I should toss the first 17 and take the next largest, to win 37% of the time.

And for 100 slips, from 50,000 trials (or 5.3576e-154% of the sample space), I should discard the first 37 to win with about the same probability. I win more than half of the time as long as I don’t discard the largest number in those first 37.

From these we can infer that to win with more than 1/3 probability, first toss about one third of the total number of slips, then pick the next largest.

These numerical results will motivate my lecture on regression. We don’t have time to simulate many trials of ten-thousand slips, so can we find a closed form expression in terms of the number of slips and the number tossed? One might try a whole bunch of possibilities, e.g.,

$$

\begin{multline}

P(n; N) = \alpha_0 + \alpha_1\frac{1}{N} + \alpha_2 e^{-n} + \alpha_3 \frac{1}{N}e^{-n}

+ \alpha_4 \frac{1}{n+1} + \alpha_5 \frac{N}{n+1} + \alpha_6 \sqrt{n} \\

+ \alpha_7 \sqrt{\frac{n}{N}} + \alpha_8 n + \alpha_9 \frac{n}{N}

+ \alpha_{10} n^2 + \alpha_{11} \frac{n^2}{N^2} + \alpha_{12} n^3

+ \alpha_{13} \frac{n^3}{N^3}

\end{multline}

$$

Using results from 10000 trials for N = [10:10:100], least squares regression gives

$$

\begin{align}

\alpha_0 & = 29.1948 \\

\alpha_1 & = 0.4535 \\

\alpha_2 & = 0.1278 \\

\alpha_3 & = -29.0677 \\

\alpha_4 & = -0.2171 \\

\alpha_5 & = -0.0004 \\

\alpha_6 & = -0.0537 \\

\alpha_7 & = 0.4514 \\

\alpha_8 & = 0.0082 \\

\alpha_9 & = -28.0046 \\

\alpha_{10} & = -0.0001 \\

\alpha_{11} & = 11.3989 \\

\alpha_{12} & = 0.0000 \\

\alpha_{13} & = -2.2567

\end{align}

$$

And below we can see it is good for N in that range:

But when we go outside this range, the polynomials take over — as they usually do:

So we have to take a smarter approach. For one, we know for sure what the end points of the function are: \(P(0;N) = 1/N\) and \(P(N-1;N) = 1/N\).

We can also surmise that the function has only one maximum, i.e.,

$$

\frac{d}{dn} P(n;N) = 0

$$

at only one point along the domain \([0,N-1]\).

Perhaps we can also require that over this domain its curvature is never positive, i.e.,

$$

\frac{d^2}{dn^2} P(n;N) \le 0.

$$

And it appears that the function is close to linear for large \(n\), which makes me think there is some logarithmic element to it.

These facts rule out sinusoidal and polynomial functions, unless we wish to include an infinite number of them. Instead, we can consider something like, oh, I don’t know,

$$

P(n;N) = \frac{1}{N} + \alpha \frac{n}{N}\log_{N}\frac{n+1}{N}.

$$

Now we satisfy all the conditions above, obtain the correct shape, and have some satisfying looking expression that makes me think about entropy.

We see below how well it fits to \(N=20\) where \(\alpha = -3.1373\)

and to \(N=100\) where \(\alpha = -4.6586\)

But when I extend this to any \(N\), something is missing.

I still can’t figure out the dependence of \(\alpha\) on \(N\).

More work is required, but what fun!