Making use of Markov chains

A stochastic process describes a system that changes in values over time. Typically in order to predict what state the process will be in in the future, the past states may have to be used as input. A Markov chain is a stochastic process in which the predictions for the future of the process are based solely on its present state. Thus a Markov chain only remembers the current step and not the steps in the past (this is called the memoryless property). Markov chains, and stochastic processes in general, have wide applications in many fields. In this post, we highlight two fun probability problems that can be solved using Markov chains.

Figure 1 – toss a coin until seeing 3 consecutive heads

The first problem is demonstrated by the above diagram – tossing a fair coin until the appearance of k consecutive heads. How many tosses are required on average to accomplish this goal? In Figure 1, it takes 17 tosses to get 3 consecutive heads. The number of tosses is random. Sometimes it takes more tosses and sometimes it takes fewer tosses. In general, what is the expected number of tosses to get k consecutive heads? For 3 consecutive heads, the answer is 14. To get 4 consecutive heads, it takes on average 30 tosses.

How does coin tossing becomes a Markov chain? Let’s take the goal of 4 consecutive heads. As the coin is tossed, we are interested in 5 different states – 0, H, HH, HHH and HHHH. The state 0 refers to the beginning of the experiment or that the current toss is a Tail. The other four states refer to the most recent consecutive heads that have appeared. In each toss of the coin, the process is in one of these 5 states. The next state depends on the current state and does not depend on the past states, making this a Markov chain. The following is the transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & \text{1=H} & \text{2=HH} & \text{3=HHH} & \text{4=HHHH} \cr        0 & 0.5 & 0.5 & 0  & 0 & 0 \cr        1 & 0.5 & 0 & 0.5 & 0 & 0 \cr        2 & 0.5 & 0 & 0 & 0.5 & 0 \cr        3 & 0.5 & 0 & 0 & 0  & 0.5 \cr        4 & 0 & 0 & 0 & 0  & 1 \cr   } \qquad

The matrix captures the probabilities of how the process transitions from one state to the next. For example, if the current state is H, then the process will go to HH (with probability 0.5) or go to 0 (with probability 0.5). Example 2 in this blog post in a companion blog shows how to use this matrix to compute the expected number of steps it takes to go from state 0 to the state HHHH. The answer is 30.

If the goal is to toss the coin until obtaining 3 consecutive heads, then the states of the process would be 0, H, HH and HHH. Set the the transition probability matrix accordingly and apply the same technique discussed in the companion blog. If the goal is to toss the coin until obtaining k consecutive heads, adjust the states and the matrix accordingly and apply the method.

Though the highlighted problem is about the average number of tosses, a broader problem is about a probability distribution of the number of tosses to get k consecutive heads. For k=4, what is the probability that the goal (4 consecutive heads) can be achieved in 4 tosses of the coin, in 5 tosses, and so on? The method in answering these questions is discussed in this blog post in the same companion blog.

Figure 2 – rolling a die until seeing all 6 faces

The second problem is demonstrated by Figure 2 – rolling a fair die until the appearance of all 6 faces. How many rolls on average are required to achieve this goal? In Figure 2, it takes 11 rolls to get all 6 faces. Example 3 in this blog post shows how to answer this question. The answer is 14.7.

It turns out that Markov chain is not needed to solve this problem since this problem is related to two classic problems in probability – the occupancy problem as well as the coupon collector problem. In fact, the coupon collector problem provides a handy formula to obtain the average number rolls in order to have all faces in a k-sided die appeared once.

As in the first problem, the broader problem is about a probability distribution. What is the probability that all 6 faces appear after 6 rolls, after 7 rolls, and after 8 rolls and so on? This blog post in the same companion blog gives the idea on how to answer such questions.

Introductory discussion on Markov chains is found in the early posts in this companion blog on stochastic processes.


Markov chains

Daniel Ma math

Dan Ma mathematics

\copyright 2017 – Dan Ma

A normal bell curve that is made with humans

A picture was found at the website of the department of statistics and actuarial science at Simon Fraser University.

Figure 1 – Normal Bell Curve of Heights of Students (found here)

Though this normal bell curve is not the work of nature, the idea is clever. A picture of a normal distribution should never go to waste. So we write about it in our introductory statistics blog (see here). Indeed, it is an excellent opportunity to discuss properties of normal distribution.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Probability and actuarial science

Consider this scenario. Both a thirty-year old man and a seventy-year old man purchase identical annuity contracts. The costs of the contracts are the same to both men. The benefits are also the same – both men receive the same monthly payment till death. What is wrong with this picture?

The life expectancy of a thirty-year old is four times longer than that of a seventy-year old. So such an annuity contract would be a good deal for the thirty-year old as he can stand to collect four times more payments than a seventy-year old (but with time value of money the differential is more than 4 times).

Such contracts sold to a thrity-year old would be a bad deal for the insurance company. In fact if the insurance company had sold such contracts to a sufficient number of thirty-year old, it almost certainly would go out of business long before the thirty-year old can live to an old age. Though good deal for the young folks, such practices are simply not sustainable both as a business practice and as a matter of public policy.

Yet back in the seventeenth century, during the reign of William III of England, this was what commonly happened – annuity contracts with the same prices and the same payouts were routinely sold to both thirty-year old and seventy-year old (see Karl Pearson, The History of Statistics In the 17th and 18th Centuries against the Changing Background of Intellectual, Scientific and Religious Thought).

Why? People back in those days did not have any conceptualization of probability or statistics. For example, they did not know that the length of someone’s life, though unpredictable on an individual basis, could be predicted statistically based on empirical observations over time and across a large group of people. The prevailing mentality was that God controlled every minute detail of the universe. Such belief ruled out any kind of conceptualization of chance as phenomena that can be studied and predicted in the aggregate.

Fast forward to the present. We now take probability seriously (and the seriousness had been steadily growing since the eighteenth century). Probability tools are widely available. Here’s a life expectancy calculator from Social Security. According to this calculator, the additional life expectancy of a man who is currently thirty-year old is 51.4 years. A man who is currently seventy-year old is expected to live 14.8 more years. So a thirty-year old man can expect to live three and a half times longer than a seventy-year old man.

With a strong foundation of probability and statistics, which was not available in the seventeenth century, annuity contracts are priced based on the probability of living a long life and the life insurance policies are priced based on the probability of dying prematurely.

Actuaries are specially-trained professionals who use their knowledge of probability, statistics, and specific models to evaluate and price risks, e.g. risks of living too long, the risks of dying too soon, the risks of property damages, the risks of becoming sick and a whole host of other risks that are quantifiable. As a result, actuaries are generally employed in life, health, and property and casualty insurance companies, consulting firms, and government.

In fact the actuarial profession grew out of the effort to put the pricing of life insurance and annuity contracts on a sound scientific and business basis in the early eighteenth century England.

The entry point into the actuarial profession is an exam on probability. It is called Exam P and is jointly administered by the Society of Actuaries (SOA) and the Casualty Actuarial Society (CAS). Here’s what SOA says about Exam P:

    The purpose of the syllabus for this examination is to develop knowledge of the fundamental probability tools for quantitatively assessing risk. The application of these tools to problems encountered in actuarial science is emphasized. A thorough command of the supporting calculus is assumed. Additionally, a very basic knowledge of insurance and risk management is assumed.

In its current incarnation, it is a three-hour exam that consists of 30 multiple-choice questions and is administered as a computer-based test.

To be an actuary, a candidate must pass a series of exams (see here for the exam requirements from SOA). Exam P is the first of many exams. It is not surprising that Exam P is the first exam since the probability is part of the foundation of actuarial science.

How difficult is Exam P? The pass rates are typically low. Exam P is administered six times a year at test locations throughout the country. Less than 45% of the candidates earn a passing score. Often times, the pass rates dip below 40%.

Exam P is an entry point of of the actuarial exam system. It is the first of many exams in the pathway to becoming an actuary. We have a companion blog for studying Exam P.

A blog for Exam P practice problems

This blog has viewership on a daily basis from every continent. In fact, some aspiring actuaries passed Exam P as a result of using this blog. Here’s two practice problems.

Sample practice problem
Sample practice problem

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Gamma Function and Gamma Distribution

The gamma function crops up almost everywhere in mathematics. It has applications in many branches of mathematics including probability and statistics. This post gives a small demonstration of the importance of the gamma function through the gamma distribution, a probability distribution that naturally arises from the gamma function.

The following is a short classic book on Gamma function by Emil Artin (a copy can be found here).

The gamma function dated back to Euler (1707-1783), who, in a letter to Christian Goldbach (1690-1764) in January 8, 1730, discussed the following integral.

    \displaystyle \int_0^1 \ \biggl[\text{ln} \biggl(\frac{1}{t} \biggr) \biggr]^{x-1} \ dt

The integral converges for all positive real number x. Thus this integral can be regarded as a function with the domain of positive real numbers. Later in 1809, Adrien-Marie Legendre (1752-1833) named the function Gamma with the symbol \Gamma(\cdot). Thus

    \displaystyle \Gamma(x)=\int_0^1 \ \biggl[\text{ln} \biggl(\frac{1}{t} \biggr) \biggr]^{x-1} \ dt=\int_0^1 \ (- \text{ln} \ t)^{x-1} \ dt \ \ \ \ \ \ x>0

A simple substitution of t \rightarrow - \text{ln} \ t results in the following useful alternative formulation.

    \displaystyle \Gamma(x)=\int_0^\infty t^{x-1} \ e^{-t} \ dt

Clearly, \Gamma(1)=1. Using integration by parts derives the following functional relationship.

    \displaystyle \Gamma(x+1)=x \ \Gamma(x)

It follows from this functional relationship that \Gamma(n)=(n-1)! for all positive integer n. Thus the Gamma function extends the factorial function. In fact, this functional relationship is used to extend the gamma function beyond x>0.

Gamma Distribution

One important consequence of the gamma function is that a probability distribution arises naturally from it. Taking the integral form of the gamma function (the one that is the result of a simple substitution) and dividing it by the gamma function value yields an integral with a value of 1.

    \displaystyle \int_0^\infty \frac{1}{\Gamma(\alpha)} \ t^{\alpha-1} \ e^{-t} \ dt=1

Then the integrand can be regarded as a probability density function (PDF).

    \displaystyle f(x)=\frac{1}{\Gamma(\alpha)} \ x^{\alpha-1} \ e^{-x} \ \ \ \ x>0

Replacing the upper limit of the integral by a variable would produce its cumulative distribution (CDF).

    \displaystyle F(x)=\int_0^x \frac{1}{\Gamma(\alpha)} \ t^{\alpha-1} \ e^{-t} \ dt

The mathematical properties of the gamma function is discussed here in a companion blog. The gamma distribution is defined in this blog post in the same companion blog.

The PDF f(x) and the CDF F(x) shown above has only one parameter \alpha, which is a positive constant that determines the shape of the distribution (called the shape parameter). Another parameter \theta, called the scale parameter, can be added to make this a two-parameter distribution, hence making it more versatile as a probability model.

    \displaystyle f(x)=\frac{1}{\Gamma(\alpha)} \ \frac{1}{\theta^\alpha} \ x^{\alpha-1} \ e^{-x/\theta} \ \ \ \ x>0

    \displaystyle F(x)=\int_0^x \frac{1}{\Gamma(\alpha)} \ \frac{1}{\theta^\alpha} \ t^{\alpha-1} \ e^{-t/\theta} \ dt \ \ \ \ x>0

The gamma distribution is useful in actuarial modeling, e.g. modeling insurance losses. Due to its mathematical properties, there is considerable flexibility in the modeling process. For example, since it has two parameters (a scale parameter and a shape parameter), the gamma distribution is capable of representing a variety of distribution shapes and dispersion patterns.

The exponential distribution is a special case of the gamma distribution and it arises naturally as the waiting time between two events in a Poisson process (see here and here).

The chi-squared distribution is also a sub family of the gamma family of distributions. Mathematically speaking, a chi-squared distribution is a gamma distribution with shape parameter k/2 and scale parameter 2 with k being a positive integer (called the degrees of freedom). Though the definition is simple mathematically, the chi-squared family plays an outsize role in statistics.

This blog post discusses the chi-square distribution from a mathematical standpoint. The chi-squared distribution also play important roles in inferential statistics for the population mean and population variance of normal populations (discussed here).

The chi-squared distribution also figures prominently in the inference on categorical data. The chi-squared test, based on the chi-squared distribution, is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. The chi-squared test is based on the chi-squared statistic, which has three different interpretations – goodness-of-fit test, test of homogeneity and test of independence.Further discussion of the chi-squared test is found here.

Transformed Gamma Distribution

Another set of distributions that are derived from the gamma family is through raising a gamma distribution to a power. Raising a gamma distribution to a positive power results in a transformed gamma distribution. Raising a gamma distribution to -1 results in an inverse gamma distribution. Raising a gamma distribution to a negative power not -1 results in an inverse transformed gamma distribution. These derived distributions greatly expand the tool kit for actuarial modeling. These distributions are discussed here.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Monty Hall Problem

The Monty Hall Problem is a classic problem in probability. It is all over the Internet. Beside being a famous problem, it stirred up a huge controversy when it was posted in a column hosted by Marilyn vos Savant in Parade Magazine back in 1990. vos Savant received negative letters in the thousands, many of them from readers who were mathematicians, scientists and engineers with PhD degrees! It is possible to get this problem wrong (or get in a wrong path in reasoning).

I wrote about this problem in these two blog posts – Monty Hall Problem and More about Monty Hall Problem. The first post describes the problem using the following 5 pictures.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

\copyright 2017 – Dan Ma

The Dice Problem

Thinking in probability can be hard sometimes. Thinking of probability in a wrong way can be costly, especially at the casino. This was what happened with Chevalier de Méré (1607-1684), who was a French writer and apparently an avid gambler. He estimated that the odds for winning in this one game were in his favor. However, he was losing money consistently on this particular game. He sensed something was amiss but could not see why his reasoning was wrong. Luckily for him, he was able to enlist two leading mathematicians at the time, Blaise Pascal and Pierre de Fermat, for help. The correspondence between Pascal and Fermat laid the foundation for the modern theory of probability.

Chevalier de Méré actually asked Pascal and Fermat for help on two problems – the problem of points and the dice problem. I wrote about these two problems in two blog posts – the problem of points and the dice problem. In this post, I make further comments on the dice problem. The point is that flawed reasoning in probability can be risky and costly, first and foremost for gamblers and to a great extent for anyone making financial decisions with uncertain future outcomes.

For Chevalier de Méré, there are actually two dice problems.

The first game involves four rolls of a fair die. In this game, de Méré made bet with even odds on rolling at least one six when a fair die is rolled four times. His reasoning was that since getting a six in one roll of a die is \frac{1}{6} (correct), the chance of getting a six in four rolls of a die would be 4 \times \frac{1}{6}=\frac{2}{3} (incorrect). With the favorable odds of 67% of winning, he reasoned that betting with even odds would be a profitable proposition. Though his calculation was incorrect, he made considerable amount of money over many years playing this game.

The second game involves twenty four rolls of a pair of fair dice. The success in the first game emboldened de Méré to make even bet on rolling one or more double sixes in twenty four rolls of a pair of dice. His reasoning was that the chance for getting a double six in one roll of a pair of dice is \frac{1}{36} (correct). Then the chance of getting a double six in twenty four rolls of a pair of dice would be 24 \times \frac{1}{36}=\frac{2}{3} (incorrect). He again reasoned that betting with even odds would be profitable too.

The problem was for Pascal and Fermat to explain why de Méré was able to make money on the first and not on the second game.

The correctly probability calculation would show that the probability of the event “rolling at least one six” happening in the first game is about 0.518 (see here). Thus de Méré would on average win 52% of the time in playing the first game at even odds. In playing 100 games, he would win about 52 games. In playing 1,000 games, he would win about 518 games. The following table calculate the amount of winning per 1,000 games for de Méré.

Results of playing the first game 1,000 times with one French franc per bet

Outcome # of Games Win/Lose Amount
Win 518 518 francs
Lose 482 -482 francs
Total 1,000 36 francs

Per 1,000 games, de Méré won on average 36 francs. So he had the house edge of 3.6% (= 36/1000).

The correct calculation would show that the probability of the event “at least one double 6” happening in the second game is about 0.491 (see here). Thus de Méré could only win about 49% of the time. Per 1,000 games, de Méré would win on average 491 games, or the opposing side would win about 509 games. The following table calculate the amount of winning per 1,000 games for de Méré.

Results of playing the second game 1,000 times with one French franc per bet

Outcome # of Games Win/Lose Amount
Win 491 491 francs
Lose 509 -509 francs
Total 1,000 -18 francs

The winning on average for de Méré is negative 18 francs per 1,000 games. So the opposing side has a house edge of 1.8% (= 18/1000).

So de Méré was totally off base with his reasoning! He thought that the probability of winning would be 2/3 in both games. The incorrect reasoning let him to believe that betting at even odds would be a winning proposition. So he thought. Though his reasoning was wrong in the first game, he was lucky that the winning odds were still better than even. For the second game, he learned the hard way – through simulation with real money!

There are two issues involved here. One is obviously the flawed reasoning in probability on the part of de Méré. The second is calculation. de Méré and his contemporaries would have a hard time making the calculation even if they were able to reason correctly. They did not have the advantage of calculators and other electronic devices that are widely available to us. For example, the following shows the calculation of the winning probabilities for both games.

    \displaystyle P(\text{at least one six})=1 - \biggl( \frac{5}{6} \biggr)^4=0.518

    \displaystyle P(\text{at least one double six})=1 - \biggl( \frac{35}{36} \biggr)^{24}=0.491

It is possible to calculate 5/6 raised to 4. Raising 35/36 to 24 would be very tedious and error prone. Any one with a hand held calculator with a key for \displaystyle y^x (raising y to x). For de Méré and his contemporaries, this calculation would probably have to done by experts.

The main stumbling block of course would be the inability to reason correctly with odds and probability. We have the benefits of the probability tools bequeathed by Pascal, Fermat and others. Learning the basic tool kit in probability is essential for anyone who deal with uncertainty.

One more comment about what Chevalier de Méré could have done (if expert mathematical help was not available). He could have performed simulation (the kind that does not involve real money). Simply roll a pair of fair dice a number of times and count how many times he wins.

He would soon find out that he would not win 2/3 of the time. He would not even win 51% of of the time. It would be more likely that he wins 49% of the time. Simulation, if done properly, does not lie. We performed one simulation of rolling a pair of dice 100,000 times (in Excel). Only 49,211 of the iterations have “at least one double six.” Without software, simulating 100,000 times may not be realistic. But Chevalier de Méré could simulate the experiment 100 times or even 1,000 times (if he hired someone to help).

\copyright 2017 – Dan Ma

Benford’s law in Hollywood

A movie called The Accountant is a 2016 film starring Ben Affleck and Anna Kendrick.

What is notable for us here at is that Chris Wolff, the protagonist played by Affleck, is a crime fighter who knows how to use a gun and a spreadsheet. Though he is an expert sniper and a martial artist, his chief weapon in fighting crime is mostly statistical in nature. One tool that stands out is the so called Benford’s law, which is a statistical law used by statisticians and forensic accountants to sniff out fraudulent numbers in financial documents. Of course this being a Hollywood movie, it cannot be just rely on numbers and statistics. There are plenty of action scenes.

Here’s an interview with a forensic accountant who vouched for the authenticity in the movie on applying the Benford’s law and other statistical investigative techniques.

According to Benford’s law, the first digits of the numbers in many natural data sets follow a certain distribution. For example, the first digits are 1 about 30% of the time. Any set of financial documents that have too few 1’s should raise a giant red flag. In the movie, Wolfe spotted the unusual frequency of the first digit 3 in a series of financial transactions. Deviations between the actual frequencies of first digits in the documents and the frequencies predicted by the Benford’s law raise suspicion. Then the investigator can dig further into the numbers to look for potential frauds.

Interested in knowing more about Benford’s law? Here’s some blog posts from several affiliated blogs.

\copyright 2017 – Dan Ma