# Quarantine

## Probabilities of Runs

There are many scenes in Quarantine where probabilistic events yield a long run of identical outcomes: a silver atom crossing a magnetic field swerves up rather than down, over and over again, or a pair of dice repeatedly fall as “snake eyes”. How can we calculate the odds of this happening when the normal rules of probability apply?

If we toss a fair coin N times, there are 2N different sequences of heads and tails possible, all of them equally likely. So the probability of getting the one sequence among them that contains exactly N heads is 1 in 2N. Similarly, if we devise an experiment that has d equally probable outcomes for a single trial, and then we carry out N trials, there are dN different sequences of results possible, and the probability of getting the one sequence where a particular outcome – say, result number 1 – is repeated in every single trial is 1 in dN.

When we toss a pair of dice there are six ways each of them can fall, for a total of 36 equally likely outcomes if we keep track of which die is which. An outcome that involves different numbers on the two dice can occur in two ways: a four and a five, say, can arise with the first die showing four and the second one showing five, or vice versa. But snake eyes, two ones, can only happen one way out of the 36, so for a toss of dice and a run of snake eyes we would set d equal to 36. The odds of getting snake eyes exactly ten times in ten throws would be 1 in 3610, or 1 in 3,­656,­158,­440,­062,­976.

But what if we perform the experiment M times, and then ask for the odds of getting (at least) a run of N consecutive repeats of our chosen special result? It will be useful to have a shorthand way of talking about this, so we will write A(M, N) for the number of sequences of M trials that contain at least a run of length N. If we can find a way to calculate A(M, N), we can just divide it by dM to find the probability.

It might be tempting simply to identify all the points at which a run might start – with the first trial, with the second, and so on, up until the last opportunity, when there are only N trials remaining out of the total. The trouble is, these situations are not mutually exclusive. If we demand a run of at least three heads in six coin tosses, then the sequence:

T H H H H T

has a run of three heads starting with the second toss, but also a run of heads starting with the third toss. Sorting out these overlaps would make the calculations very complicated.

However, we can put all of the A(M, N) sequences into two mutually exclusive categories, as follows. In M trials, it must be either true or false that the last M–1 trials, taken in isolation, contain a run of at least N repeats of result 1.

Suppose it’s true. Then whatever result we get for the first trial (and there are d different possibilities) we will certainly still have a run of at least N in the whole set of M trials. Since we are assuming a run of at least N in the last M–1 trials, that can happen A(M–1, N) ways. So this possibility covers:

d A(M–1, N)

different sequences where we end up with a run of at least N.

Now let’s examine the case where it’s false that the last M–1 trials, taken in isolation, contain a run of at least N repeats of result 1 ... and yet nevertheless we end up with such a run in the full set of M trials. This can only happen if the very first trial yields a result of 1, followed immediately by a run of N–1 from the last M–1 trials. In other words, the sequence must start like this:

1 1 1 ... 1 (N times)

followed by a result that certainly can’t be a 1, otherwise there would have been a run of N even when we excluded the first result. That means there are d–1 possibilities for the result of experiment number N+1.

The remaining MN–1 results can be any sequence of that length that does not contain a run of at least N. We can find that simply by subtracting the number that do contain such a run from the total number of that length:

dMN–1A(MN–1, N)

So, by combining these two mutually exclusive possibilities we can count the total number of sequences of length M that contain a run of at least N:

A(M, N) = d A(M–1, N) + (d–1)(dMN–1 A(MN–1, N))

Of course this formula can’t give us an answer immediately, since it describes A in terms of itself. But the numbers of trials for which we need to know A on the right-hand side are always less than the original number, M, so by applying this formula repeatedly we will eventually reach a case that we know:

A(N, N) = 1

That is, in N trials there is exactly one sequence that contains a run of length N. Of course if M is less than N, we can’t get possibly get a run of length N, so in that case A(M, N) = 0.

Calculating these quantities for large numbers of trials is best done by computer, but we can illustrate the method by asking how likely it is to get snake eyes at least twice in a row in various numbers of tosses. So we set d=36, for snake eyes, and N=2, for a run of length two. We then have:

A(0, 2) = 0
A(1, 2) = 0
A(2, 2) = 1
A(3, 2) = 36×A(2, 2) + 35×(360A(0, 2)) = 71
A(4, 2) = 36×A(3, 2) + 35×(361A(1, 2)) = 3,816
A(5, 2) = 36×A(4, 2) + 35×(362A(2, 2)) = 182,701
A(6, 2) = 36×A(5, 2) + 35×(363A(3, 2)) = 8,207,711

The odds are then found by dividing by the total number of possible sequences of outcomes, 36M. So we have:

For 2 throws, 1 in 362, about 0.0772%
For 3 throws, 71 in 363, about 0.152%
For 4 throws, 3,816 in 364, about 0.227%
For 5 throws, 182,701 in 365, about 0.302%
For 6 throws, 8,207,711 in 366, about 0.377%

Continuing this process far enough, it’s possible to show that by the time you reach 924 throws of the dice, the odds are better than 50% of having thrown at least two consecutive snake eyes.

Although we can only calculate these individual probabilities with the recursive formula described above, it’s possible to find a simple formula that tells us directly the average number of trials we would need to perform in order to achieve a run of length N. We will call this quantity E. If the probability of a successful outcome in a single trial is p=1/d, then E must satisfy the equation:

E = (1–p) (p0(E+1) + p1(E+2) + p2(E+3) + ... + pN–1(E+N)) + pN N

We obtain this equation by imagining all the possible outcomes when we start from scratch. With probability pj (1–p), where j=0,...,N–1, we might get j successes followed by a failure. If so, we will have to start over again, having wasted the first j+1 trials. So in these cases, the expectation value becomes the expectation value starting from scratch, E, plus the number of wasted trials. Alternatively, with probability pN, we might get N successes in N trials, and the expectation value will be exactly N. So we equate E to the probability-weighted sum of all these possible outcomes.

The equation can be simplified by using standard formulas for the sums of geometric and arithmetico-geometric sequences, and it is solved by:

E = (pN – 1) / [pN (p – 1)]

or, putting p=1/d:

E = d (dN – 1) / (d – 1) = d + d2 + ... + dN

The form below will calculate the probability for any problem of this kind: just enter the relevant quantities then hit the “Calculate probability” button.

Probability of a run of events

Quarantine / Probabilities of Runs / created Friday, 13 September 2013