Distribution of the Mean from a Non-normal Distribution

Many situations of interest are modeled not by normal random variables, but by others that are non particularly close to normal.   We have seen examples where it makes sense to use Bernoulli, or geometric, or Poisson random variables.   These are discrete, and their densities look nothing like those of normal random variables.   But it is a remarkable fact that even in these cases, for sufficiently large values of   n   we have that   X¯n   is well approximated by normal distributions.   Moreover, this fact does not depend on the random variable   X   with which we start.   All that matters is that   X   have a mean   μ   and a variance   σ2 .   That is, if random variable   X   has a mean   μ   and a variance   σ2 , then as   n   the distribution of   X¯n   tends toward a normal distribution.   This is a powerful fact, and allows for many important applications.   It turns out to be so important that it is given a special, rather high-falutin’ name.

Central Limit Theorem: (CLT) Let   X   be a random variable having mean   μ   and a variance   σ2 .   Consider the random variable
Yn=X¯nμσn.
As   n , the distribution function of   Yn   approaches the standard normal distribution function   Z0,1 .   This is to be interpreted as saying that (for any   a,bR ) the probability of   Yn   takes on values between   a   and   b   becomes ever closer to the probability that   Z0,1   takes on values between   a   and   b .

Here on the left is an image of the density function of   Y25   and the standard normal density function   Z0,1   in the case where   X   has distribution   Bernoulli(13) .   On the right are the graphs of the corresponding distribution functions.

comparison of average of 25 independent identical Bernoulli trials (less the mean and divided by the standard deviation of this average) with the standard normal density
comparison of average of 25 independent identical Bernoulli trials (less the mean and divided by the standard deviation of this average) with the standard normal density

A proof of the CLT would take us on a substantial detour, and so a sketch will be available as a drop-down addendum in the future.   For now, we take the theorem as fact without justification.

Note: Experience has shown that for purposes of approximation, samples of size   n30   will suffice to give decent accuracy when estimating distribution functions using normal distributions.   The image above shows the remarkable agreement even for values of   n   below   30 .

With the CLT in mind, and knowing that binomial densities are just sums of Bernoulli densities, we can approximate both binomial densities and binomial distribution functions.   That is, if   X7nbsp; has Bernoulli density   Bernoulli(p) , and   X¯=1n(X1++Xn) is   1n   times a random variable   Y   having density   binom(n,p) .   Thus
X¯p(1p)pn
being approximately   Z0,1   is equivalent to
1nYpp(1p)nZ0,1,
or
1nYZp,p(1p)n.
This last is a statement regarding how the average of   n   trials of   X   is approximately normal.   This can also be interpreted as
Ynpnp(1p)n=Ynpnp(1p)Z0,1.
We may see this as
Ynp(1p)Z0,1+np=Znp,np(1p),
so that binomial random variables can be approximated by members of the family of normal random variables.

Here is the graph of the density function for the discrete random variable   binom(25,13)   compared with that for the continuous random variable   Z253,532 .

comparison of a binomial density with a normal density having the same mean and standard deviation
comparison of a binomial density with a normal density having the same mean and standard deviation

As an example, consider the following problem.

Example: The incidence of color blindness in a population is   10% .   From a survey of   50,000   members of this population, approximate the probability that the number of color blind survey participants is less than   4,900 .   Approximate the probability that the number of color blind participants is greater than   5,200 .

Whether or not one person from the survey is color blind is a random variable with Bernoulli density function   Bernoulli(x;.1) , where   1   indicates color blind and   0   indicates not.   Thus the number of people color blind from a sample of size   50,000   is a random variable with density   binom(k;50000,.1)   given by
binom(k;50000,.1)=(50000k)(.9)50000k(.1)k,
hardly inviting.   The probability that less than   4,900   survey participants are color blind is
k=04899(50000k)(.9)50000k(.1)k,
again, not a convenient number to obtain.

Instead, since we know the mean and standard deviation of   binom(k;50000,.1) , i.e.
μ=500000.1=5000andσ=500000.1(10.1)67.1,
we approximate
k=04899(50000k)(.9)50000k(.1)kP(67.1Z0,1+5000<4899.5)=P(67.1Z0,1<100.5)=P(Z0,1<1.4978).
We calculate this to be
P(Z0,1<1.4978)0.0671.
That is, we have determined that there is less than a seven percent chance that the number of color blind people in the population will be less than   4,900 .   This should be compared to the actual value
k=04899(50000k)(.9)50000k(.1)k0.0667.
This took a computer algebra system over an hour to obtain.

Similarly, the probability that the number of color blind participants is greater than   5,200   is approximated by
k=520150000(50000k)(.9)50000k(.1)kP(67.1Z0,1+5000>5200.5)=P(Z0,1>2.9881)0.0014.
Again, the computation of the actual value is prohibitive.   Using much time with a computer algebra system we obtain the actual value
k=520150000(50000k)(.9)50000k(.1)k0.0015.

Here is an example illustrating how we can use normal approximation to assess whether or not a game is fair.

Problem: A fair coin is flipped   1000   times,   Approximate the probability that heads occurs at most   n times.   What value of   n   gives the approximation   95% ?   99% ?   If a coin is flipped   1000   times and comes up heads   532   times, is it reasonable to think that it is a fair coin?

On flipping a fair coin   1000   times, the probability of getting heads   k   times is
binom(k;1000,0.5)=(1000k)(.5)1000.
It follows that the probability of getting heads at most   k   times is
k=0n(1000k)(.5)1000.

This can be very unpleasant to calculate explicitly.   Knowing that   binom(1000,.5)   has mean   μ=1000(.5)=500   and standard deviation   1000(.5)2=510 , we instead use the approximation
Z500,510=510Z0,1+500,
and determine that the desired probability is approximately
P(Z500,510<n+12)=P(Z0,1<(n+12)500510)=(n+12)50051012πex22dx.
We can obtain the probability using, for example, the cumulative standard normal distribution function   NORMSDIST(x)   in spreadsheet applications.   Thus, if   n=520 , then the probability that   binom(1000,0.5)   assumes a value no greater than   520   is approximately
(520+12)50051012πex22dx=1.926512πex22dx=NORMSDIST(1.9265)=0.9026.

The   x   value for which
xex22dx=0.95
is approximately   x=1.6449 , so
(n+12)50051012πex22dx=0.95
when
(n+12)500510=1.6449.
we solve this for   n   to find that
n=525.508.
So, that the number of heads occurring is greater than   525   happens less than   5%   of the time.

The   x   value for which
xex22dx=0.99
is approximately   x=2.3263 , so
(n+12)50051012πex22dx=0.99
when
(n+12)500510=2.3263.
we solve this for   n   to find that
n=536.282.
So, that the number of heads occurring is greater than   536   happens less than   1%   of the time.

When on a thousand flips a coin comes up   532   times, and we ask if it is reasonable to assume that the coin is fair, we need to agree on what makes for “reasonable”.   If we agree that the number of heads be within a   1-out-of-20   range, then   532>525   and this is too many heads to be acceptable.   If we agree that the number of heads be within a   1-out-of-100   range, then   532<536   and this is few enough heads to be accepted.

Here is a spreadsheet which runs the following experiment: pick one of   0   or   1   with equal probability a thousand times (think of this as a coin flip where   1   means heads).   This is done in cells D3:D1002.   Then count the number of times   1   occurs.   This is given in cell D1.   Perform this experiment five hundred times (columns D through SI).   The number of times that the experiment results in greater than   525   heads is given in cell B1.   Compare this number with   25   –   that is, 5%   of the experiments.   The number of times that the experiment results in greater than   536   heads is given in cell B2.   Compare this number with   5   You can't use 'macro parameter character #' in math mode\, 1\%$   of the experiments.

Click here to open a copy of this so you can experiment with it. You will need to be signed in to a Google account.