Probability and Statistics

Probability and Statistics
The probability blog focuses on topics that are usually covered in an undergraduate sequence on probability and statistics. Particular emphasis is paid to various discrete and continuous probability models as well as basic methods and notions in probability. Some of the probability models are Poisson distribution, binomial distribution, negative binomial distribution and hypergeometric distribution as well as exponential distribution, gamma distribution, lognormal distribution among others. The basic probability methods include conditional probability, conditional distribution, hazard rate function and order statistics, among others. Based on the discussion on order statistics, the blog also has some contents on non-parametric inference.

Another major focus is on classic problems in probability. Examples include the matching problem, the occupancy problem, the birthday problem, gambler’s ruin, the problem of points, coupon collector problem and Monty Hall problem.

Topics in Probability

Topics in Probability. This is a blog to cover topics more advanced than the general blog in probability. The current focus is on stochastic processes, in particular Markov chains and random walks.

Introductory Statistics
The statistics blog is an outgrowth of courses I used to teach at a local community college. Thus the blog mirrors some of the contents from such a course. For example, the blog covers both descriptive statistics and inferential statistics.

The blog is data centered. Throughout the blog, real data is used as much as possible to illustrate the concepts and methods. For example, linear regression is illustrated by using close to 100 years of rainfall data in Los Angeles area. Data discussed in the blog include both quantitative data and categorical data.

On the descriptive statistics side, there are ample examples discussing how to make sense of data using graphs (e.g. histograms and stem-and-leaf plots) as well as using numerical measures, which include measures of center such as mean, median and 5-number summary and measures of spread such as standard deviation and interquartile range.

For measures of center, the mean is arithmetic (the sum of all data points divided by the number of data points) and the median is positional (the middle data point when the data is sorted). For measures of spread, standard deviation is arithmetic and interquartile range is positional. Which should be used as a numerical description of a data set? A great deal of emphasis is placed on the notion of resistant measures, which is central to the answer of this question.

Another area of emphasis in the blog is on normal distribution and statistical inference. The concepts of sampling distributions, hypotheses and p-value are gradually built up as tools for evaluating claims made on observed sample data.

The blog also focuses on production of data, i.e. data that come from observational studies versus data that come from experiments. In fact, the most viewed blog post is on designs of experiments.

Other topics include regression, analyzing the independence or dependence of two categorical variables using two-way tables (one example is based on the survival data of the passengers of Titanic) as well as a discussion of Benford’s law.

Another feature of the blog is the discussion of famous studies (examples: the Nun Study, the Collaborative Atorvastatin Diabetes Study (CARDS), the Diabetes Prevention Program (DPP)).

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma