Tower of Hanoi

The tower of Hanoi is a game that works on multiple levels. It is a challenging game that test the agility and organization skills of the player. It is also a game mathematicians would love since the game is an excellent illustration of math concepts such as mathematical induction and exponential growth. It is also a concrete illustration of a recursive algorithm.

The game of tower of Hanoi involves moving disks from one rod into another rod. The following is a tower of Hanoi game with 8 disks and three rods.

8-disk Tower of Hanoi (Wikipedia)

The goal of the game is to move all the disks from the left most rod to the right most rod, one disk at a time. Only the uppermost disk on a rod can be moved. In addition, you can only place a smaller disk on top of a larger disk.

Tower of Hanoi sets such as the one shown above are available from Amazon. A home made tower of Hanoi set can also be created. For example, use paper to mark three spots (to serve as rods). Then stack books of varying sizes in one spot and proceed to move the books to another spot according to the rules described above. Kitchen plates can also be used in place of books.

There are also many online versions of the game. Tow examples: version from Math is Fun (from 3-disk to 8-disk game) and 3-disk game to 9-disk game are available in this version. Both sites are easy to use. I prefer the Math is Fun version since the disks are in different colors. Of course, there are many others to choose from (simply Google tower of Hanoi game).

Obviously, the more disks there are in the game, the more difficult it is to successfully to transfer the disks. It is possible that the player may make more moves than necessary if the player is not organized or gets lost.

A 3-disk game can be played in 7 moves and no less than 7 moves. A 4-disk game can be played in a minimum of 15 moves. For a player who gets lost may end up taking more than 15 moves in a 4-disk game. Any player in the know can finish the 4-disk game in 15 moves. The 5-disk game can be played in 31 moves. For 6-disk games, 63 moves. For 7-disk game, 127 moves. To see these for yourself, explore the game using a home made set or play online. The game is also discussed here in a companion blog.

Notice that whenever an additional disk is added to the game, the minimum number of moves is doubled, e.g. from 7 moves to 15 moves (from 3 disks to 4 disks), from 15 to 31 (from 4 disks to 5 disks) and so on. In general, the n-disk game requires a minimum of 2^n-1 moves. Thus the tower of Hanoi is a concrete example of an illustration of exponential growth – increasing the size of the game by one disk doubles the time required to play the game.

In general exponential growth is a phenomenon such that increasing the input by one unit increases the output by a constant multiple (e.g. doubling, tripling, or multiplying with other constant). In contrast, linear growth (or growing linearly) means that increasing the input by one unit increases the output by a constant but as an additive amount.

The exponential growth is even easier to see if the moves are converted into time. Assume that it takes one second to move a disk. It would take 63 seconds to play the 6-disk game, roughly one minute. It would take 127 seconds to play the 7-disk game, roughly 2 minutes. In that two minutes, the play would need to know exactly what the moves should be. Otherwise it would be easy to make a mistake and hence taking more moves than necessary. So converting the moves to seconds further illustrates the exponential growth inherent in the tower of Hanoi game.

A more subtle aspect of the tower of Hanoi game is that in order to play it successfully (i.e. in the minimum number of moves), the game must be played recursively. Take the 4-disk game as example. Imagine that the 4 disks are at first in the left rod. The goal is to move them to the right rod (the destination rod). The rod in the middle is the intermediate rod. The strategy is to move the first 3 disk to the middle rod. Then move the 4th disk (the largest disk) from the left rod to the right rod. The remaining moves will be to move the three disks in the middle rod to the right rod.

With n disks, move the top n-1 disks into the intermediate rod (by following the rules of course). Then move the largest disk in the starting rod into the destination rod. To finish off the game, move the n-1 disks in the intermediate rod into the destination rod. So the n-disk game is executed by playing two (n-1)-disk games with the move of the largest disk in between. So the tower of Hanoi is a great introduction to a recursive algorithm. The tower of Hanoi game would be a great computer programming exercise.

Because of the recursive nature of the game, it would be a challenge to keep track of the moves when the number of disks is large. In a 4-disk game, you would play 3-disk games twice with one move of the largest disk in between. This can be managed with ease after some minimal practice. Say, you want to play the 8-disk game, you would need to play 7-disk game twice with one move of the largest disk in between. For each of the two 7-disk games, you would need to play 6-disk game twice with one more move in between. That would mean four 6-disk games. Then in each of the 6-disk game, you need to play 5-disk game twice with one more move in between. The recursion can get complicated fast! It will be helpful to use diagrams to keep track of all the sub games that are required in the recursive algorithm. This is discussed here in a companion blog.

Now that we know adding one disk to the game of tower of Hanoi doubles the number of moves, hence doubling the time it takes to play. What about doubling the number of disks?

The 8-disk game only requires a minimum of 255 moves (about 4 minutes with one second per move). The 16-disk game would require 65,535 moves, over 1,000 minutes (assuming one second per move) or over 18 hours! The following shows a 32-disk tower of Hanoi set, which is located in a museum in Mexico.

32-disk Tower of Hanoi (Wikipedia)

A 32-disk game would require 2^{32}-1 moves, which is 4,294,967,295. Assuming one second per move, this would be over 136 years! If the workers in the museum is required to move the disks from one rod to another rod by following the rules of the game, that’s would be job security!

The game of Tower of Hanoi is a deceptively simple game. Yet the effect of doubling the number of disks is very dramatic. What about doubling the number of disks to 64, twice as many disks as the one shown above? The following is an interesting tale of the origin of the game of Tower of Hanoi [1].

    In the Temple of Benares, beneath the dome which marks the centre of the world, rests a brass-plate in which are fixed three diamond needles, each a cubit high and as thick as the body of a bee. On one of these needles, at the creation, God placed sixty-four discs of pure gold, the largest disc resting on the brass plate, and the others getting smaller and smaller up to the top one. This is the Tower of Bramah! Day and night unceasingly the priests transfer the discs from one diamond needle to another according to the fixed and immutable laws of Bramah, which require that the priest must not move more than one disc at a time and that he must place this disc on a needle so that there is no smaller disc below it. When the sixty-four discs shall have been thus transferred from the needle on which at the creation God placed them to one of the other needles, tower, temple and Brahmins alike will crumble into dust, and with a thunderclap the world will vanish.

The game of the tower of Hanoi was invented by the French mathematician Édouard Lucas in 1883. A year later, an author called Henri de Parville told of the above interesting tale about the origin of the tower of Hanoi.

It is not known whether Lucas, the inventor of the game, invented this legend or was inspired by it. One thing is clear. The legend accurately describes the enormity of the 64-disk game of the tower of Hanoi.

The least number of moves that are required to play the 64-disk game is 2^{64}-1, which is 18,446,744,073,709,551,615, when converted to years would be 585 billion years (again, assuming one second per disk). In contrast, the age of the universe is believed to be 13.82 billion years. The age of the sun is believed to be 4.6 billion years. The remaining lifetime of the sun is believed to be around 5 billion years. So by the time the sun dies out the game is still not finished!

Back to the question about what happens when the number of disks is doubled. For the 8-disk game, the number of moves is 255. For the 16-disk game, the number of moves is 65,535. Note that the square of 255 is 65,025. So doubling the number of disks has the effect of squaring the number of moves. This is another demonstration of exponential growth.


  1. Hinz Andreas M., Kla. Dzar Sandi, Milutinovic Uros, Ciril Petr, The Tower of Hanoi – Myths and Maths, Springer Basel, Heidelberg, New York, Dordrecht, London, 2013.

\copyright 2017 – Dan Ma

Gamma Function and Gamma Distribution

The gamma function crops up almost everywhere in mathematics. It has applications in many branches of mathematics including probability and statistics. This post gives a small demonstration of the importance of the gamma function through the gamma distribution, a probability distribution that naturally arises from the gamma function.

The following is a short classic book on Gamma function by Emil Artin (a copy can be found here).

The gamma function dated back to Euler (1707-1783), who, in a letter to Christian Goldbach (1690-1764) in January 8, 1730, discussed the following integral.

    \displaystyle \int_0^1 \ \biggl[\text{ln} \biggl(\frac{1}{t} \biggr) \biggr]^{x-1} \ dt

The integral converges for all positive real number x. Thus this integral can be regarded as a function with the domain of positive real numbers. Later in 1809, Adrien-Marie Legendre (1752-1833) named the function Gamma with the symbol \Gamma(\cdot). Thus

    \displaystyle \Gamma(x)=\int_0^1 \ \biggl[\text{ln} \biggl(\frac{1}{t} \biggr) \biggr]^{x-1} \ dt=\int_0^1 \ (- \text{ln} \ t)^{x-1} \ dt \ \ \ \ \ \ x>0

A simple substitution of t \rightarrow - \text{ln} \ t results in the following useful alternative formulation.

    \displaystyle \Gamma(x)=\int_0^\infty t^{x-1} \ e^{-t} \ dt

Clearly, \Gamma(1)=1. Using integration by parts derives the following functional relationship.

    \displaystyle \Gamma(x+1)=x \ \Gamma(x)

It follows from this functional relationship that \Gamma(n)=(n-1)! for all positive integer n. Thus the Gamma function extends the factorial function. In fact, this functional relationship is used to extend the gamma function beyond x>0.

Gamma Distribution

One important consequence of the gamma function is that a probability distribution arises naturally from it. Taking the integral form of the gamma function (the one that is the result of a simple substitution) and dividing it by the gamma function value yields an integral with a value of 1.

    \displaystyle \int_0^\infty \frac{1}{\Gamma(\alpha)} \ t^{\alpha-1} \ e^{-t} \ dt=1

Then the integrand can be regarded as a probability density function (PDF).

    \displaystyle f(x)=\frac{1}{\Gamma(\alpha)} \ x^{\alpha-1} \ e^{-x} \ \ \ \ x>0

Replacing the upper limit of the integral by a variable would produce its cumulative distribution (CDF).

    \displaystyle F(x)=\int_0^x \frac{1}{\Gamma(\alpha)} \ t^{\alpha-1} \ e^{-t} \ dt

The mathematical properties of the gamma function is discussed here in a companion blog. The gamma distribution is defined in this blog post in the same companion blog.

The PDF f(x) and the CDF F(x) shown above has only one parameter \alpha, which is a positive constant that determines the shape of the distribution (called the shape parameter). Another parameter \theta, called the scale parameter, can be added to make this a two-parameter distribution, hence making it more versatile as a probability model.

    \displaystyle f(x)=\frac{1}{\Gamma(\alpha)} \ \frac{1}{\theta^\alpha} \ x^{\alpha-1} \ e^{-x/\theta} \ \ \ \ x>0

    \displaystyle F(x)=\int_0^x \frac{1}{\Gamma(\alpha)} \ \frac{1}{\theta^\alpha} \ t^{\alpha-1} \ e^{-t/\theta} \ dt \ \ \ \ x>0

The gamma distribution is useful in actuarial modeling, e.g. modeling insurance losses. Due to its mathematical properties, there is considerable flexibility in the modeling process. For example, since it has two parameters (a scale parameter and a shape parameter), the gamma distribution is capable of representing a variety of distribution shapes and dispersion patterns.

The exponential distribution is a special case of the gamma distribution and it arises naturally as the waiting time between two events in a Poisson process (see here and here).

The chi-squared distribution is also a sub family of the gamma family of distributions. Mathematically speaking, a chi-squared distribution is a gamma distribution with shape parameter k/2 and scale parameter 2 with k being a positive integer (called the degrees of freedom). Though the definition is simple mathematically, the chi-squared family plays an outsize role in statistics.

This blog post discusses the chi-square distribution from a mathematical standpoint. The chi-squared distribution also play important roles in inferential statistics for the population mean and population variance of normal populations (discussed here).

The chi-squared distribution also figures prominently in the inference on categorical data. The chi-squared test, based on the chi-squared distribution, is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. The chi-squared test is based on the chi-squared statistic, which has three different interpretations – goodness-of-fit test, test of homogeneity and test of independence.Further discussion of the chi-squared test is found here.

Transformed Gamma Distribution

Another set of distributions that are derived from the gamma family is through raising a gamma distribution to a power. Raising a gamma distribution to a positive power results in a transformed gamma distribution. Raising a gamma distribution to -1 results in an inverse gamma distribution. Raising a gamma distribution to a negative power not -1 results in an inverse transformed gamma distribution. These derived distributions greatly expand the tool kit for actuarial modeling. These distributions are discussed here.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Monty Hall Problem

The Monty Hall Problem is a classic problem in probability. It is all over the Internet. Beside being a famous problem, it stirred up a huge controversy when it was posted in a column hosted by Marilyn vos Savant in Parade Magazine back in 1990. vos Savant received negative letters in the thousands, many of them from readers who were mathematicians, scientists and engineers with PhD degrees! It is possible to get this problem wrong (or get in a wrong path in reasoning).

I wrote about this problem in these two blog posts – Monty Hall Problem and More about Monty Hall Problem. The first post describes the problem using the following 5 pictures.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

\copyright 2017 – Dan Ma

The Dice Problem

Thinking in probability can be hard sometimes. Thinking of probability in a wrong way can be costly, especially at the casino. This was what happened with Chevalier de Méré (1607-1684), who was a French writer and apparently an avid gambler. He estimated that the odds for winning in this one game were in his favor. However, he was losing money consistently on this particular game. He sensed something was amiss but could not see why his reasoning was wrong. Luckily for him, he was able to enlist two leading mathematicians at the time, Blaise Pascal and Pierre de Fermat, for help. The correspondence between Pascal and Fermat laid the foundation for the modern theory of probability.

Chevalier de Méré actually asked Pascal and Fermat for help on two problems – the problem of points and the dice problem. I wrote about these two problems in two blog posts – the problem of points and the dice problem. In this post, I make further comments on the dice problem. The point is that flawed reasoning in probability can be risky and costly, first and foremost for gamblers and to a great extent for anyone making financial decisions with uncertain future outcomes.

For Chevalier de Méré, there are actually two dice problems.

The first game involves four rolls of a fair die. In this game, de Méré made bet with even odds on rolling at least one six when a fair die is rolled four times. His reasoning was that since getting a six in one roll of a die is \frac{1}{6} (correct), the chance of getting a six in four rolls of a die would be 4 \times \frac{1}{6}=\frac{2}{3} (incorrect). With the favorable odds of 67% of winning, he reasoned that betting with even odds would be a profitable proposition. Though his calculation was incorrect, he made considerable amount of money over many years playing this game.

The second game involves twenty four rolls of a pair of fair dice. The success in the first game emboldened de Méré to make even bet on rolling one or more double sixes in twenty four rolls of a pair of dice. His reasoning was that the chance for getting a double six in one roll of a pair of dice is \frac{1}{36} (correct). Then the chance of getting a double six in twenty four rolls of a pair of dice would be 24 \times \frac{1}{36}=\frac{2}{3} (incorrect). He again reasoned that betting with even odds would be profitable too.

The problem was for Pascal and Fermat to explain why de Méré was able to make money on the first and not on the second game.

The correctly probability calculation would show that the probability of the event “rolling at least one six” happening in the first game is about 0.518 (see here). Thus de Méré would on average win 52% of the time in playing the first game at even odds. In playing 100 games, he would win about 52 games. In playing 1,000 games, he would win about 518 games. The following table calculate the amount of winning per 1,000 games for de Méré.

Results of playing the first game 1,000 times with one French franc per bet

Outcome # of Games Win/Lose Amount
Win 518 518 francs
Lose 482 -482 francs
Total 1,000 36 francs

Per 1,000 games, de Méré won on average 36 francs. So he had the house edge of 3.6% (= 36/1000).

The correct calculation would show that the probability of the event “at least one double 6” happening in the second game is about 0.491 (see here). Thus de Méré could only win about 49% of the time. Per 1,000 games, de Méré would win on average 491 games, or the opposing side would win about 509 games. The following table calculate the amount of winning per 1,000 games for de Méré.

Results of playing the second game 1,000 times with one French franc per bet

Outcome # of Games Win/Lose Amount
Win 491 491 francs
Lose 509 -509 francs
Total 1,000 -18 francs

The winning on average for de Méré is negative 18 francs per 1,000 games. So the opposing side has a house edge of 1.8% (= 18/1000).

So de Méré was totally off base with his reasoning! He thought that the probability of winning would be 2/3 in both games. The incorrect reasoning let him to believe that betting at even odds would be a winning proposition. So he thought. Though his reasoning was wrong in the first game, he was lucky that the winning odds were still better than even. For the second game, he learned the hard way – through simulation with real money!

There are two issues involved here. One is obviously the flawed reasoning in probability on the part of de Méré. The second is calculation. de Méré and his contemporaries would have a hard time making the calculation even if they were able to reason correctly. They did not have the advantage of calculators and other electronic devices that are widely available to us. For example, the following shows the calculation of the winning probabilities for both games.

    \displaystyle P(\text{at least one six})=1 - \biggl( \frac{5}{6} \biggr)^4=0.518

    \displaystyle P(\text{at least one double six})=1 - \biggl( \frac{35}{36} \biggr)^{24}=0.491

It is possible to calculate 5/6 raised to 4. Raising 35/36 to 24 would be very tedious and error prone. Any one with a hand held calculator with a key for \displaystyle y^x (raising y to x). For de Méré and his contemporaries, this calculation would probably have to done by experts.

The main stumbling block of course would be the inability to reason correctly with odds and probability. We have the benefits of the probability tools bequeathed by Pascal, Fermat and others. Learning the basic tool kit in probability is essential for anyone who deal with uncertainty.

One more comment about what Chevalier de Méré could have done (if expert mathematical help was not available). He could have performed simulation (the kind that does not involve real money). Simply roll a pair of fair dice a number of times and count how many times he wins.

He would soon find out that he would not win 2/3 of the time. He would not even win 51% of of the time. It would be more likely that he wins 49% of the time. Simulation, if done properly, does not lie. We performed one simulation of rolling a pair of dice 100,000 times (in Excel). Only 49,211 of the iterations have “at least one double six.” Without software, simulating 100,000 times may not be realistic. But Chevalier de Méré could simulate the experiment 100 times or even 1,000 times (if he hired someone to help).

\copyright 2017 – Dan Ma

Benford’s law in Hollywood

A movie called The Accountant is a 2016 film starring Ben Affleck and Anna Kendrick.

What is notable for us here at is that Chris Wolff, the protagonist played by Affleck, is a crime fighter who knows how to use a gun and a spreadsheet. Though he is an expert sniper and a martial artist, his chief weapon in fighting crime is mostly statistical in nature. One tool that stands out is the so called Benford’s law, which is a statistical law used by statisticians and forensic accountants to sniff out fraudulent numbers in financial documents. Of course this being a Hollywood movie, it cannot be just rely on numbers and statistics. There are plenty of action scenes.

Here’s an interview with a forensic accountant who vouched for the authenticity in the movie on applying the Benford’s law and other statistical investigative techniques.

According to Benford’s law, the first digits of the numbers in many natural data sets follow a certain distribution. For example, the first digits are 1 about 30% of the time. Any set of financial documents that have too few 1’s should raise a giant red flag. In the movie, Wolfe spotted the unusual frequency of the first digit 3 in a series of financial transactions. Deviations between the actual frequencies of first digits in the documents and the frequencies predicted by the Benford’s law raise suspicion. Then the investigator can dig further into the numbers to look for potential frauds.

Interested in knowing more about Benford’s law? Here’s some blog posts from several affiliated blogs.

\copyright 2017 – Dan Ma