On Friday Jimmy Kimmel asked people passing by the theater where his show is taped their passwords for the Internet (see the YouTube video below). This is part entertainment and part public service to time with the data breach at EquiFax that was reported two days ago.
A huge security breach at credit reporting company Equifax has exposed sensitive information, such as date of birth, Social Security numbers and addresses and in some cases driver license numbers, of up to 143 million Americans. The data breach is among the worst in U.S. history. The number of people affected is well over half of the adult population in the United States. According to Equifax, the data breach happened between mid-May and July. The hack was discovered on July 29, but Equifax did not inform the public until September 7.
The first person on the video was asked, “We are talking to people about the cyber-security breach at Equifax, and in light of that, we’re asking people how secure their Internet passwords are. What do you use for an internet password?” Without hesitation, the young man responded, “Um, I usually stick to my last name. That’s probably not the best thing to do, but usually it’s my last name, a few digits, um, maybe like a hashtag or something.” The interviewer then asked what his last name is. The young man readily gave out the last name. The interviewer even spelled out the last name for him to confirm. The young man also confirmed, upon asking, that the digits that go with the last name are his birthday.
The video is funny. I am amazed at the laziness and carelessness of the people in the video. First of all, the same password should not be used across multiple accounts. A password certainly should not consist of the name of the person with a few digits such as date of birth or the zip code. Everyone in the video is using the same type of passwords. Of course, it could just be a “manipulated” sample (it only includes password stories that have entertainment value).
The same stunt was done previously after the data hack at Sony two years earlier (see the YouTube video below).
There are various strategies one can use to create strong passwords that are easy (or easier) to keep track of. For example, come up with a memorable phrase and the password would be created from using the first letter in each word. Example: The first house I ever lived in was 613 Fake Street. Rent was $400 per month. The resulting password is
TfhIeliw613FS.Rw$400pm
(example found here). This is a 22-character password that is based on memorable phrase consisting of two sentences. The beauty is that the password has upper case and lower case letters and numeric characters and special symbols. It is arranged in such a way that people not in the know cannot guess easily. Of course, you who know the memorable phrase can remember. The same password should not be reused for other accounts (don’t be lazy). So come up with a memorable phrase for each account.
There is another way to generate passwords that are strong. The passwords generated in this scheme are 26-character passwords with the first character being the first letter of the English alphabets, the second character being the second letter of the English alphabets and the third character being the third letter of the English alphabets and so on. In fact, this should be given in the Jimmy Kimmel’s video mentioned above. Though all the letters are known, the scheme produces over 67 million possible passwords (67,108,864 to be exact). Read this blog post to know more. Once someone understands how this scheme works, he or she understands the binomial distribution.
The recent winner of the Powerball jackpot is Mavis Wanczyk, a hospital worker from Chicopee, Massachusetts.
The drawing was on 8/23/2017 and the winning numbers are 6, 7, 16, 23, 26, and Powerball number 4. The size of the jackpot was $758.7 million, the largest undivided lottery jackpot in North American history. Instead of having the winnings being paid out over a 30-year period (the annuity option), Wanczyk took a lump-sum payment of $480 million and took home $336 million after taxes. This recent winning is widely reported. Here’s are one instance and another instance of reporting.
We wish Ms. Wanczyk well, hoping that she will manage the unexpected windfall in ways that add to her happiness. For lottery winners of giant jackpot, sometimes the winning is the easy part. Google “the curse of the lottery”, you will see plenty of stories of lottery winners who lost big – breaking up of marriages, going bankrupt, getting robbed, being swindled and in some cases committing suicide or being murdered.
In some states, by law the lottery winners must make public appearances holding a giant publicity check in front of camera. For the states that have no such requirements, where the public appearances are voluntary, wise winners would skip any photo ops (their identity would still be revealed) and head immediately to an undisclosed location. They know that plenty of slings and arrows (in some cases bullets) would come their ways – from swindlers, fraudsters and robbers as well as from the long lost friends and relatives who want to share the wealth. Just like one famous line in the movie Forrest Gump, ‘run, Forrest, run!” That would be the best advice for a winner of a giant and sudden windfall of cash. Of course, it is also important to hire a reputable and trustworthy financial adviser.
Sudden windfall cash usually does not last long. About 70 percent of the time, the cash will be gone in a few years, according to the National Endowment for Financial Education (see this piece from time.com).
The Time piece also mentions several stories of lottery winnings gone wrong. One winner mentioned is Abraham Shakespeare, who won a $30 million jackpot in Florida. He told his brother, “‘I’d have been better off broke.” Shakespeare (the lottery winner) has his own page in Wikipedia. His eventual fate: he was murdered by a swindler named Dee-Dee Moore 3 years after winning the big prize. The Wikipedia page of Abraham Shakespeare is more like a posthumous monument of his notoriety as a murdered lottery winner, rather than for highlighting achievements.
The Time piece also mentions a “success” story. Richard Lustig is a 65-year-old Florida man who is a seven-time lottery game grand-prize winner. He had the wisdom of hiring a good financial planner and a good accountant. With the right mindset and the foresight of financial planning, he and his family are enjoying the good life made possible by the lottery winnings two decades earlier.
Shakespeare and Lustig are from two opposite extremes in post lottery winning experiences. In between these two extremes, there are plenty of nightmarish stories with most of them being ended up in poverty, some in drug addiction (stories are here and here).
The Google search for “the curse of the lottery” turns up plenty of advice too. Here’s a piece from Forbes. Another article is a piece from Wired. The piece from Lotto Report has sad stories and other information that can shed more light on the lottery curse. Here’s home page of the Lotto Report.
As horrendous as some of the lottery curse stories are, the odds of incurring such fate are extremely rare. The odds for winning the Mega Millions jackpot is 1 in over 175 million (see here for the calculation). The odds of winning the Powerball jackpot is one in over 292 million (see here for the calculation). The odds of being struck by lightning is 1 in 700,000 according to a piece from National Geographic (significantly below 1 in a million odds). The odds of lightning strike would be more similar to the odds for winning the jackpot in a smaller lottery, e.g. Fantasy 5 in California Lottery (1 in 575,757).
Of course, the longer the odds, the larger the potential jackpot. In fact, some of the most viewed articles in a companion blog called Talking about Numbers are about lotteries. The articles deal with California Lottery. But the ideas and observations would apply to other lotteries as well.
One way to calculate the odds of winning the top prize in a lottery is through math (done here for various games in California Lottery and here for Powerball). Another way is to look at data.
In this piece in Talking about Numbers, I showed that there are only 257 winning tickets with payouts of $1 million or more in the 26-year period from 1985 (the founding of California Lottery) to August 2011, averaging 10 “$1 million plus” winning tickets a year. Of these 257 winning tickets, 247 are in the first 25 years and 10 in the last year.
Naturally, I would like to update the study but California Lottery had since then made it hard to gather the data in their website. But the essential fact remains the same. There are on average about 10 winning tickets a year that pay out $1 million or more. These 257 winning tickets are out of over 9 billion purchased tickets! This means the odds for winning a “million dollar plus” prize in California prize are about one in 36 million (calculated here).
Of course, California Lottery will try their best to give the impression that winning is more commonplace. Lottery authorities are in the business of selling tickets. They do not want to provide a picture reflecting the true odds of winning big. The odds of 1 in 36 million are much better odds than the Powerball odds for sure. But the prizes are not as mega as Powerball (the average of 247 winning tickets from 1985 to 2010 for California Lottery is $18 million).
The tower of Hanoi is a game that works on multiple levels. It is a challenging game that test the agility and organization skills of the player. It is also a game mathematicians would love since the game is an excellent illustration of math concepts such as mathematical induction and exponential growth. It is also a concrete illustration of a recursive algorithm.
The game of tower of Hanoi involves moving disks from one rod into another rod. The following is a tower of Hanoi game with 8 disks and three rods.
The goal of the game is to move all the disks from the left most rod to the right most rod, one disk at a time. Only the uppermost disk on a rod can be moved. In addition, you can only place a smaller disk on top of a larger disk.
Tower of Hanoi sets such as the one shown above are available from Amazon. A home made tower of Hanoi set can also be created. For example, use paper to mark three spots (to serve as rods). Then stack books of varying sizes in one spot and proceed to move the books to another spot according to the rules described above. Kitchen plates can also be used in place of books.
Obviously, the more disks there are in the game, the more difficult it is to successfully to transfer the disks. It is possible that the player may make more moves than necessary if the player is not organized or gets lost.
A 3-disk game can be played in 7 moves and no less than 7 moves. A 4-disk game can be played in a minimum of 15 moves. For a player who gets lost may end up taking more than 15 moves in a 4-disk game. Any player in the know can finish the 4-disk game in 15 moves. The 5-disk game can be played in 31 moves. For 6-disk games, 63 moves. For 7-disk game, 127 moves. To see these for yourself, explore the game using a home made set or play online. The game is also discussed here in a companion blog.
Notice that whenever an additional disk is added to the game, the minimum number of moves is doubled, e.g. from 7 moves to 15 moves (from 3 disks to 4 disks), from 15 to 31 (from 4 disks to 5 disks) and so on. In general, the -disk game requires a minimum of moves. Thus the tower of Hanoi is a concrete example of an illustration of exponential growth – increasing the size of the game by one disk doubles the time required to play the game.
In general exponential growth is a phenomenon such that increasing the input by one unit increases the output by a constant multiple (e.g. doubling, tripling, or multiplying with other constant). In contrast, linear growth (or growing linearly) means that increasing the input by one unit increases the output by a constant but as an additive amount.
The exponential growth is even easier to see if the moves are converted into time. Assume that it takes one second to move a disk. It would take 63 seconds to play the 6-disk game, roughly one minute. It would take 127 seconds to play the 7-disk game, roughly 2 minutes. In that two minutes, the play would need to know exactly what the moves should be. Otherwise it would be easy to make a mistake and hence taking more moves than necessary. So converting the moves to seconds further illustrates the exponential growth inherent in the tower of Hanoi game.
A more subtle aspect of the tower of Hanoi game is that in order to play it successfully (i.e. in the minimum number of moves), the game must be played recursively. Take the 4-disk game as example. Imagine that the 4 disks are at first in the left rod. The goal is to move them to the right rod (the destination rod). The rod in the middle is the intermediate rod. The strategy is to move the first 3 disk to the middle rod. Then move the 4th disk (the largest disk) from the left rod to the right rod. The remaining moves will be to move the three disks in the middle rod to the right rod.
With disks, move the top disks into the intermediate rod (by following the rules of course). Then move the largest disk in the starting rod into the destination rod. To finish off the game, move the disks in the intermediate rod into the destination rod. So the -disk game is executed by playing two -disk games with the move of the largest disk in between. So the tower of Hanoi is a great introduction to a recursive algorithm. The tower of Hanoi game would be a great computer programming exercise.
Because of the recursive nature of the game, it would be a challenge to keep track of the moves when the number of disks is large. In a 4-disk game, you would play 3-disk games twice with one move of the largest disk in between. This can be managed with ease after some minimal practice. Say, you want to play the 8-disk game, you would need to play 7-disk game twice with one move of the largest disk in between. For each of the two 7-disk games, you would need to play 6-disk game twice with one more move in between. That would mean four 6-disk games. Then in each of the 6-disk game, you need to play 5-disk game twice with one more move in between. The recursion can get complicated fast! It will be helpful to use diagrams to keep track of all the sub games that are required in the recursive algorithm. This is discussed here in a companion blog.
Now that we know adding one disk to the game of tower of Hanoi doubles the number of moves, hence doubling the time it takes to play. What about doubling the number of disks?
The 8-disk game only requires a minimum of 255 moves (about 4 minutes with one second per move). The 16-disk game would require 65,535 moves, over 1,000 minutes (assuming one second per move) or over 18 hours! The following shows a 32-disk tower of Hanoi set, which is located in a museum in Mexico.
A 32-disk game would require moves, which is 4,294,967,295. Assuming one second per move, this would be over 136 years! If the workers in the museum is required to move the disks from one rod to another rod by following the rules of the game, that’s would be job security!
The game of Tower of Hanoi is a deceptively simple game. Yet the effect of doubling the number of disks is very dramatic. What about doubling the number of disks to 64, twice as many disks as the one shown above? The following is an interesting tale of the origin of the game of Tower of Hanoi [1].
In the Temple of Benares, beneath the dome which marks the centre of the world, rests a brass-plate in which are fixed three diamond needles, each a cubit high and as thick as the body of a bee. On one of these needles, at the creation, God placed sixty-four discs of pure gold, the largest disc resting on the brass plate, and the others getting smaller and smaller up to the top one. This is the Tower of Bramah! Day and night unceasingly the priests transfer the discs from one diamond needle to another according to the fixed and immutable laws of Bramah, which require that the priest must not move more than one disc at a time and that he must place this disc on a needle so that there is no smaller disc below it. When the sixty-four discs shall have been thus transferred from the needle on which at the creation God placed them to one of the other needles, tower, temple and Brahmins alike will crumble into dust, and with a thunderclap the world will vanish.
The game of the tower of Hanoi was invented by the French mathematician Édouard Lucas in 1883. A year later, an author called Henri de Parville told of the above interesting tale about the origin of the tower of Hanoi.
It is not known whether Lucas, the inventor of the game, invented this legend or was inspired by it. One thing is clear. The legend accurately describes the enormity of the 64-disk game of the tower of Hanoi.
The least number of moves that are required to play the 64-disk game is , which is 18,446,744,073,709,551,615, when converted to years would be 585 billion years (again, assuming one second per disk). In contrast, the age of the universe is believed to be 13.82 billion years. The age of the sun is believed to be 4.6 billion years. The remaining lifetime of the sun is believed to be around 5 billion years. So by the time the sun dies out the game is still not finished!
Back to the question about what happens when the number of disks is doubled. For the 8-disk game, the number of moves is 255. For the 16-disk game, the number of moves is 65,535. Note that the square of 255 is 65,025. So doubling the number of disks has the effect of squaring the number of moves. This is another demonstration of exponential growth.
Reference
Hinz Andreas M., Kla. Dzar Sandi, Milutinovic Uros, Ciril Petr, The Tower of Hanoi – Myths and Maths, Springer Basel, Heidelberg, New York, Dordrecht, London, 2013.
The gamma function crops up almost everywhere in mathematics. It has applications in many branches of mathematics including probability and statistics. This post gives a small demonstration of the importance of the gamma function through the gamma distribution, a probability distribution that naturally arises from the gamma function.
The following is a short classic book on Gamma function by Emil Artin (a copy can be found here).
The gamma function dated back to Euler (1707-1783), who, in a letter to Christian Goldbach (1690-1764) in January 8, 1730, discussed the following integral.
The integral converges for all positive real number . Thus this integral can be regarded as a function with the domain of positive real numbers. Later in 1809, Adrien-Marie Legendre (1752-1833) named the function Gamma with the symbol . Thus
A simple substitution of results in the following useful alternative formulation.
Clearly, . Using integration by parts derives the following functional relationship.
It follows from this functional relationship that for all positive integer . Thus the Gamma function extends the factorial function. In fact, this functional relationship is used to extend the gamma function beyond .
Gamma Distribution
One important consequence of the gamma function is that a probability distribution arises naturally from it. Taking the integral form of the gamma function (the one that is the result of a simple substitution) and dividing it by the gamma function value yields an integral with a value of 1.
Then the integrand can be regarded as a probability density function (PDF).
Replacing the upper limit of the integral by a variable would produce its cumulative distribution (CDF).
The mathematical properties of the gamma function is discussed here in a companion blog. The gamma distribution is defined in this blog post in the same companion blog.
The PDF and the CDF shown above has only one parameter , which is a positive constant that determines the shape of the distribution (called the shape parameter). Another parameter , called the scale parameter, can be added to make this a two-parameter distribution, hence making it more versatile as a probability model.
The gamma distribution is useful in actuarial modeling, e.g. modeling insurance losses. Due to its mathematical properties, there is considerable flexibility in the modeling process. For example, since it has two parameters (a scale parameter and a shape parameter), the gamma distribution is capable of representing a variety of distribution shapes and dispersion patterns.
The exponential distribution is a special case of the gamma distribution and it arises naturally as the waiting time between two events in a Poisson process (see here and here).
The chi-squared distribution is also a sub family of the gamma family of distributions. Mathematically speaking, a chi-squared distribution is a gamma distribution with shape parameter and scale parameter 2 with being a positive integer (called the degrees of freedom). Though the definition is simple mathematically, the chi-squared family plays an outsize role in statistics.
This blog post discusses the chi-square distribution from a mathematical standpoint. The chi-squared distribution also play important roles in inferential statistics for the population mean and population variance of normal populations (discussed here).
The chi-squared distribution also figures prominently in the inference on categorical data. The chi-squared test, based on the chi-squared distribution, is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. The chi-squared test is based on the chi-squared statistic, which has three different interpretations – goodness-of-fit test, test of homogeneity and test of independence.Further discussion of the chi-squared test is found here.
Transformed Gamma Distribution
Another set of distributions that are derived from the gamma family is through raising a gamma distribution to a power. Raising a gamma distribution to a positive power results in a transformed gamma distribution. Raising a gamma distribution to -1 results in an inverse gamma distribution. Raising a gamma distribution to a negative power not -1 results in an inverse transformed gamma distribution. These derived distributions greatly expand the tool kit for actuarial modeling. These distributions are discussed here.
The Monty Hall Problem is a classic problem in probability. It is all over the Internet. Beside being a famous problem, it stirred up a huge controversy when it was posted in a column hosted by Marilyn vos Savant in Parade Magazine back in 1990. vos Savant received negative letters in the thousands, many of them from readers who were mathematicians, scientists and engineers with PhD degrees! It is possible to get this problem wrong (or get in a wrong path in reasoning).
Thinking in probability can be hard sometimes. Thinking of probability in a wrong way can be costly, especially at the casino. This was what happened with Chevalier de Méré (1607-1684), who was a French writer and apparently an avid gambler. He estimated that the odds for winning in this one game were in his favor. However, he was losing money consistently on this particular game. He sensed something was amiss but could not see why his reasoning was wrong. Luckily for him, he was able to enlist two leading mathematicians at the time, Blaise Pascal and Pierre de Fermat, for help. The correspondence between Pascal and Fermat laid the foundation for the modern theory of probability.
Chevalier de Méré actually asked Pascal and Fermat for help on two problems – the problem of points and the dice problem. I wrote about these two problems in two blog posts – the problem of points and the dice problem. In this post, I make further comments on the dice problem. The point is that flawed reasoning in probability can be risky and costly, first and foremost for gamblers and to a great extent for anyone making financial decisions with uncertain future outcomes.
For Chevalier de Méré, there are actually two dice problems.
The first game involves four rolls of a fair die. In this game, de Méré made bet with even odds on rolling at least one six when a fair die is rolled four times. His reasoning was that since getting a six in one roll of a die is (correct), the chance of getting a six in four rolls of a die would be (incorrect). With the favorable odds of 67% of winning, he reasoned that betting with even odds would be a profitable proposition. Though his calculation was incorrect, he made considerable amount of money over many years playing this game.
The second game involves twenty four rolls of a pair of fair dice. The success in the first game emboldened de Méré to make even bet on rolling one or more double sixes in twenty four rolls of a pair of dice. His reasoning was that the chance for getting a double six in one roll of a pair of dice is (correct). Then the chance of getting a double six in twenty four rolls of a pair of dice would be (incorrect). He again reasoned that betting with even odds would be profitable too.
The problem was for Pascal and Fermat to explain why de Méré was able to make money on the first and not on the second game.
The correctly probability calculation would show that the probability of the event “rolling at least one six” happening in the first game is about 0.518 (see here). Thus de Méré would on average win 52% of the time in playing the first game at even odds. In playing 100 games, he would win about 52 games. In playing 1,000 games, he would win about 518 games. The following table calculate the amount of winning per 1,000 games for de Méré.
Results of playing the first game 1,000 times with one French franc per bet
Outcome
# of Games
Win/Lose Amount
Win
518
518 francs
Lose
482
-482 francs
Total
1,000
36 francs
Per 1,000 games, de Méré won on average 36 francs. So he had the house edge of 3.6% (= 36/1000).
The correct calculation would show that the probability of the event “at least one double 6” happening in the second game is about 0.491 (see here). Thus de Méré could only win about 49% of the time. Per 1,000 games, de Méré would win on average 491 games, or the opposing side would win about 509 games. The following table calculate the amount of winning per 1,000 games for de Méré.
Results of playing the second game 1,000 times with one French franc per bet
Outcome
# of Games
Win/Lose Amount
Win
491
491 francs
Lose
509
-509 francs
Total
1,000
-18 francs
The winning on average for de Méré is negative 18 francs per 1,000 games. So the opposing side has a house edge of 1.8% (= 18/1000).
So de Méré was totally off base with his reasoning! He thought that the probability of winning would be 2/3 in both games. The incorrect reasoning let him to believe that betting at even odds would be a winning proposition. So he thought. Though his reasoning was wrong in the first game, he was lucky that the winning odds were still better than even. For the second game, he learned the hard way – through simulation with real money!
There are two issues involved here. One is obviously the flawed reasoning in probability on the part of de Méré. The second is calculation. de Méré and his contemporaries would have a hard time making the calculation even if they were able to reason correctly. They did not have the advantage of calculators and other electronic devices that are widely available to us. For example, the following shows the calculation of the winning probabilities for both games.
It is possible to calculate 5/6 raised to 4. Raising 35/36 to 24 would be very tedious and error prone. Any one with a hand held calculator with a key for (raising y to x). For de Méré and his contemporaries, this calculation would probably have to done by experts.
The main stumbling block of course would be the inability to reason correctly with odds and probability. We have the benefits of the probability tools bequeathed by Pascal, Fermat and others. Learning the basic tool kit in probability is essential for anyone who deal with uncertainty.
One more comment about what Chevalier de Méré could have done (if expert mathematical help was not available). He could have performed simulation (the kind that does not involve real money). Simply roll a pair of fair dice a number of times and count how many times he wins.
He would soon find out that he would not win 2/3 of the time. He would not even win 51% of of the time. It would be more likely that he wins 49% of the time. Simulation, if done properly, does not lie. We performed one simulation of rolling a pair of dice 100,000 times (in Excel). Only 49,211 of the iterations have “at least one double six.” Without software, simulating 100,000 times may not be realistic. But Chevalier de Méré could simulate the experiment 100 times or even 1,000 times (if he hired someone to help).
_______________________________________________________________________________
2017 – Dan Ma
A movie called The Accountant is a 2016 film starring Ben Affleck and Anna Kendrick.
What is notable for us here at climbingmatheverest.com is that Chris Wolff, the protagonist played by Affleck, is a crime fighter who knows how to use a gun and a spreadsheet. Though he is an expert sniper and a martial artist, his chief weapon in fighting crime is mostly statistical in nature. One tool that stands out is the so called Benford’s law, which is a statistical law used by statisticians and forensic accountants to sniff out fraudulent numbers in financial documents. Of course this being a Hollywood movie, it cannot be just rely on numbers and statistics. There are plenty of action scenes.
Here’s an interview with a forensic accountant who vouched for the authenticity in the movie on applying the Benford’s law and other statistical investigative techniques.
According to Benford’s law, the first digits of the numbers in many natural data sets follow a certain distribution. For example, the first digits are 1 about 30% of the time. Any set of financial documents that have too few 1’s should raise a giant red flag. In the movie, Wolfe spotted the unusual frequency of the first digit 3 in a series of financial transactions. Deviations between the actual frequencies of first digits in the documents and the frequencies predicted by the Benford’s law raise suspicion. Then the investigator can dig further into the numbers to look for potential frauds.
Interested in knowing more about Benford’s law? Here’s some blog posts from several affiliated blogs.