Making use of Markov chains

A stochastic process describes a system that changes in values over time. Typically in order to predict what state the process will be in in the future, the past states may have to be used as input. A Markov chain is a stochastic process in which the predictions for the future of the process are based solely on its present state. Thus a Markov chain only remembers the current step and not the steps in the past (this is called the memoryless property). Markov chains, and stochastic processes in general, have wide applications in many fields. In this post, we highlight two fun probability problems that can be solved using Markov chains.

Figure 1 – toss a coin until seeing 3 consecutive heads

The first problem is demonstrated by the above diagram – tossing a fair coin until the appearance of k consecutive heads. How many tosses are required on average to accomplish this goal? In Figure 1, it takes 17 tosses to get 3 consecutive heads. The number of tosses is random. Sometimes it takes more tosses and sometimes it takes fewer tosses. In general, what is the expected number of tosses to get k consecutive heads? For 3 consecutive heads, the answer is 14. To get 4 consecutive heads, it takes on average 30 tosses.

How does coin tossing becomes a Markov chain? Let’s take the goal of 4 consecutive heads. As the coin is tossed, we are interested in 5 different states – 0, H, HH, HHH and HHHH. The state 0 refers to the beginning of the experiment or that the current toss is a Tail. The other four states refer to the most recent consecutive heads that have appeared. In each toss of the coin, the process is in one of these 5 states. The next state depends on the current state and does not depend on the past states, making this a Markov chain. The following is the transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & \text{1=H} & \text{2=HH} & \text{3=HHH} & \text{4=HHHH} \cr        0 & 0.5 & 0.5 & 0  & 0 & 0 \cr        1 & 0.5 & 0 & 0.5 & 0 & 0 \cr        2 & 0.5 & 0 & 0 & 0.5 & 0 \cr        3 & 0.5 & 0 & 0 & 0  & 0.5 \cr        4 & 0 & 0 & 0 & 0  & 1 \cr   } \qquad

The matrix captures the probabilities of how the process transitions from one state to the next. For example, if the current state is H, then the process will go to HH (with probability 0.5) or go to 0 (with probability 0.5). Example 2 in this blog post in a companion blog shows how to use this matrix to compute the expected number of steps it takes to go from state 0 to the state HHHH. The answer is 30.

If the goal is to toss the coin until obtaining 3 consecutive heads, then the states of the process would be 0, H, HH and HHH. Set the the transition probability matrix accordingly and apply the same technique discussed in the companion blog. If the goal is to toss the coin until obtaining k consecutive heads, adjust the states and the matrix accordingly and apply the method.

Though the highlighted problem is about the average number of tosses, a broader problem is about a probability distribution of the number of tosses to get k consecutive heads. For k=4, what is the probability that the goal (4 consecutive heads) can be achieved in 4 tosses of the coin, in 5 tosses, and so on? The method in answering these questions is discussed in this blog post in the same companion blog.

Figure 2 – rolling a die until seeing all 6 faces

The second problem is demonstrated by Figure 2 – rolling a fair die until the appearance of all 6 faces. How many rolls on average are required to achieve this goal? In Figure 2, it takes 11 rolls to get all 6 faces. Example 3 in this blog post shows how to answer this question. The answer is 14.7.

It turns out that Markov chain is not needed to solve this problem since this problem is related to two classic problems in probability – the occupancy problem as well as the coupon collector problem. In fact, the coupon collector problem provides a handy formula to obtain the average number rolls in order to have all faces in a k-sided die appeared once.

As in the first problem, the broader problem is about a probability distribution. What is the probability that all 6 faces appear after 6 rolls, after 7 rolls, and after 8 rolls and so on? This blog post in the same companion blog gives the idea on how to answer such questions.

Introductory discussion on Markov chains is found in the early posts in this companion blog on stochastic processes.


Markov chains

Daniel Ma math

Dan Ma mathematics

\copyright 2017 – Dan Ma

Monty Hall Problem

The Monty Hall Problem is a classic problem in probability. It is all over the Internet. Beside being a famous problem, it stirred up a huge controversy when it was posted in a column hosted by Marilyn vos Savant in Parade Magazine back in 1990. vos Savant received negative letters in the thousands, many of them from readers who were mathematicians, scientists and engineers with PhD degrees! It is possible to get this problem wrong (or get in a wrong path in reasoning).

I wrote about this problem in these two blog posts – Monty Hall Problem and More about Monty Hall Problem. The first post describes the problem using the following 5 pictures.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

\copyright 2017 – Dan Ma

The Dice Problem

Thinking in probability can be hard sometimes. Thinking of probability in a wrong way can be costly, especially at the casino. This was what happened with Chevalier de Méré (1607-1684), who was a French writer and apparently an avid gambler. He estimated that the odds for winning in this one game were in his favor. However, he was losing money consistently on this particular game. He sensed something was amiss but could not see why his reasoning was wrong. Luckily for him, he was able to enlist two leading mathematicians at the time, Blaise Pascal and Pierre de Fermat, for help. The correspondence between Pascal and Fermat laid the foundation for the modern theory of probability.

Chevalier de Méré actually asked Pascal and Fermat for help on two problems – the problem of points and the dice problem. I wrote about these two problems in two blog posts – the problem of points and the dice problem. In this post, I make further comments on the dice problem. The point is that flawed reasoning in probability can be risky and costly, first and foremost for gamblers and to a great extent for anyone making financial decisions with uncertain future outcomes.

For Chevalier de Méré, there are actually two dice problems.

The first game involves four rolls of a fair die. In this game, de Méré made bet with even odds on rolling at least one six when a fair die is rolled four times. His reasoning was that since getting a six in one roll of a die is \frac{1}{6} (correct), the chance of getting a six in four rolls of a die would be 4 \times \frac{1}{6}=\frac{2}{3} (incorrect). With the favorable odds of 67% of winning, he reasoned that betting with even odds would be a profitable proposition. Though his calculation was incorrect, he made considerable amount of money over many years playing this game.

The second game involves twenty four rolls of a pair of fair dice. The success in the first game emboldened de Méré to make even bet on rolling one or more double sixes in twenty four rolls of a pair of dice. His reasoning was that the chance for getting a double six in one roll of a pair of dice is \frac{1}{36} (correct). Then the chance of getting a double six in twenty four rolls of a pair of dice would be 24 \times \frac{1}{36}=\frac{2}{3} (incorrect). He again reasoned that betting with even odds would be profitable too.

The problem was for Pascal and Fermat to explain why de Méré was able to make money on the first and not on the second game.

The correctly probability calculation would show that the probability of the event “rolling at least one six” happening in the first game is about 0.518 (see here). Thus de Méré would on average win 52% of the time in playing the first game at even odds. In playing 100 games, he would win about 52 games. In playing 1,000 games, he would win about 518 games. The following table calculate the amount of winning per 1,000 games for de Méré.

Results of playing the first game 1,000 times with one French franc per bet

Outcome # of Games Win/Lose Amount
Win 518 518 francs
Lose 482 -482 francs
Total 1,000 36 francs

Per 1,000 games, de Méré won on average 36 francs. So he had the house edge of 3.6% (= 36/1000).

The correct calculation would show that the probability of the event “at least one double 6” happening in the second game is about 0.491 (see here). Thus de Méré could only win about 49% of the time. Per 1,000 games, de Méré would win on average 491 games, or the opposing side would win about 509 games. The following table calculate the amount of winning per 1,000 games for de Méré.

Results of playing the second game 1,000 times with one French franc per bet

Outcome # of Games Win/Lose Amount
Win 491 491 francs
Lose 509 -509 francs
Total 1,000 -18 francs

The winning on average for de Méré is negative 18 francs per 1,000 games. So the opposing side has a house edge of 1.8% (= 18/1000).

So de Méré was totally off base with his reasoning! He thought that the probability of winning would be 2/3 in both games. The incorrect reasoning let him to believe that betting at even odds would be a winning proposition. So he thought. Though his reasoning was wrong in the first game, he was lucky that the winning odds were still better than even. For the second game, he learned the hard way – through simulation with real money!

There are two issues involved here. One is obviously the flawed reasoning in probability on the part of de Méré. The second is calculation. de Méré and his contemporaries would have a hard time making the calculation even if they were able to reason correctly. They did not have the advantage of calculators and other electronic devices that are widely available to us. For example, the following shows the calculation of the winning probabilities for both games.

    \displaystyle P(\text{at least one six})=1 - \biggl( \frac{5}{6} \biggr)^4=0.518

    \displaystyle P(\text{at least one double six})=1 - \biggl( \frac{35}{36} \biggr)^{24}=0.491

It is possible to calculate 5/6 raised to 4. Raising 35/36 to 24 would be very tedious and error prone. Any one with a hand held calculator with a key for \displaystyle y^x (raising y to x). For de Méré and his contemporaries, this calculation would probably have to done by experts.

The main stumbling block of course would be the inability to reason correctly with odds and probability. We have the benefits of the probability tools bequeathed by Pascal, Fermat and others. Learning the basic tool kit in probability is essential for anyone who deal with uncertainty.

One more comment about what Chevalier de Méré could have done (if expert mathematical help was not available). He could have performed simulation (the kind that does not involve real money). Simply roll a pair of fair dice a number of times and count how many times he wins.

He would soon find out that he would not win 2/3 of the time. He would not even win 51% of of the time. It would be more likely that he wins 49% of the time. Simulation, if done properly, does not lie. We performed one simulation of rolling a pair of dice 100,000 times (in Excel). Only 49,211 of the iterations have “at least one double six.” Without software, simulating 100,000 times may not be realistic. But Chevalier de Méré could simulate the experiment 100 times or even 1,000 times (if he hired someone to help).

\copyright 2017 – Dan Ma