The Law of Large Numbers and Convergence to Probability

Observe the following three examples.

Example:   Bernoulli trials

Consider running a Bernoulli experiment a thousand times.   In the following spreadsheet a parameter   $p$   is randomly generated (cell G1) to the nearest hundredth, within   $[.25, .75]\,$ .   A Bernoulli experiment, $\, Bernoulli(p)$   is run (column A) and the data is compiled by frequency (column D) and by relative frequency (column E).   A histogram for the relative frequency data is displayed next to a plot of the density function for the Bernoulli experiment.

Click here to open a copy of this so you can experiment with it. You will need to be signed in to a Google account.

This histogram looks a great deal like the density plot.

Example:   Discrete Uniform Distribution

Consider rolling an   $8$-sided die two thousand times.   The possible outcomes are   $1\,$ , $\,2\,$ , $\,\cdots\,$ , $\,8\;$ .   In the following spreadsheet we model this (column A) and the data is compiled by frequency (column D) and by relative frequency (column E).   A histogram for the relative frequency data is displayed next to a plot of the density function for the discrete uniform random variable.

Click here to open a copy of this so you can experiment with it. You will need to be signed in to a Google account.

Again, the histogram looks a great deal like the density plot.

Example:   Poisson Distribution

Consider running a   Poisson experiment a thousand times.   The possible outcomes are   $0\,$ , $\,1\,$ , $\,2\,$ , $\,\cdots\;$ .   In the following spreadsheet a parameter   $p$   is randomly generated (cell I1) to the nearest tenth, within   $[4, 8]\,$ .   A Poisson experiment, $\, Poisson(p)$   is run (column A) and the data is compiled by frequency (column D) and by relative frequency (column E).   A histogram for the relative frequency data is displayed next to a plot of the density function for the Poisson random variable.

Click here to open a copy of this so you can experiment with it. You will need to be signed in to a Google account.

Here too the histogram looks a great deal like the density plot.

The similarities of these histograms to the generating densities are not coincidence.

Fact: (Borel’s Law of Large Numbers   –   given here in a very vague sense):   If the sample size is large, the relative frequency histogram will resemble the distribution, and this resemblance will become more precise as the sample size increases.

Various versions of this or some of its consequences go under related names.

The Weak Law of Large Numbers is a statement saying essentially that for any specified positive margin specified, no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value; that is, within the margin.

The Strong Law of Large Numbers is a strengthening of the above weak law.   This law justifies the intuitive interpretation of the expected value of a random variable ,when sampled repeatedly, as the “long-term average”.

Borel’s Law of Large Numbers can be made more precise as follows.

Fact: (Borel’s Law of Large Numbers):   If an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event occurs approximately equals the probability of the event’s occurrence on any particular trial; the larger the number of repetitions, the better the approximation tends to be. More precisely, if   $A$   denotes the event in question, $\,p$ its probability of occurrence, and   $N_n(A)$   the number of times   $A$   occurs in the first   $n$   trials, then with probability one,

$\displaystyle{ \frac{N_n(A)}{n} \to p }$   as   $n\to \infty\;$ .

How to See the Weak Law of Large Numbers

This is due to Chebyshev, and is the reason why he developed Chebyshev’s Theorem.

Suppose we have a random variable   $X$   with mean   $\mu$   and standard deviation   $\sigma\;$ .   Take   $n$   independent copies of   $X$   (call them   $X_1\,$ , $\,X_2\,$ , $\,\cdots\,$ , $\,X_n\,$ ) and average to get
$$ \bar{X} =\frac{1}{n}\,\sum_{k=1}^n\, X_k \; \text{.} $$
Considerations from the previous section show that this random variable   $\bar{X}$   has mean   $\mu$   and standard deviation   $\displaystyle{\bar{\sigma} =\frac{\sigma}{\sqrt{n}}}\;$ .

The average over a sample of   $X$   of size   $n$   can be thought of as a single sample of the variable   $\bar{X}\;$ .   We wish to see that if   $n$   is sufficiently large then this sample of   $\bar{X}$   is, with probability close to   $1\,$ , very close to   $\mu\;$ .

To this end, let   $k$   be a large positive integer.   We will take   $k=10$   for convenience.   Chebyshev’s theorem says
$$
\text{P}( \left| \bar{X} -\mu \right| \lt k\,\bar{\sigma})
= \text{P}( \left| \bar{X} -\mu \right| \lt k\,\frac{\sigma}{\sqrt{n}}) \gt 1-\frac{1}{k^2}
=\frac{99}{100} \;\text{.}
$$
That is, with   $99\%$   probability   $\bar{X}$   will be within   $\displaystyle{10\,\frac{\sigma}{\sqrt{n}}}$   of   $\mu\;$ .   Taking   $n$   large, we can make this interval   $\left( \mu -\displaystyle{10\,\frac{\sigma}{\sqrt{n}}} , \mu +\displaystyle{10\,\frac{\sigma}{\sqrt{n}}} \right)$   about   $\mu$   arbitrarily small.   In words, for large enough   $n$   we have that with at least   $99\%$   probability   $\bar{X}$   is extremely close to   $\mu\;$ .