Estimating the Mean of a Normal Variable from a Sample

On sampling a normal random variable   Zμ,σ   of mean   μ   and standard deviation   σ randomly and independently   n   times, the average   X¯   is a normal random variable with mean   μ   and standard deviation   σn .   That is, X¯   has the same density as   X¯=Zμ,σn .   As such, we can construct a priori confidence intervals for where   X¯   will take its value

Example:   Find a   90%   confidence interval about the mean for   X¯ , a sample of size   100   of   Z17,2 .

As   X¯   has the same density as   Z17,2100=Z17,0.2 , a   90%   confidence interval about the mean is of the form   (17m,17+m)   and satisfies   P(Z17,0.2(17m,17+m))=0.9 .   But this means   P(Z0,0.2(m,m))=0.9 , or   P(Z0,1(m.2,m.2))=0.9 .   This is satisfied when   m.2=1.645 , or   m=.329 .

We now wish to turn this process around.   That is, given that we have completed a test and have data at hand, we wish to draw conclusions about the probabilistic process (presuming that there is such) generating our data.

For example, suppose we have a sample of size one-hundred from a process   X   known to be normal, with unknown mean   μ1   and known standard deviation   σ=3 .   We seek the   μ   values such that   X¯   is in the   90%   confidence interval about the mean   μ   of   Zμ,3100=Zμ,.3 .   This will be turn out to be an interval   I , and we will call   I   the   90%   confidence interval for   μ1 .   That is, μ1I   precisely when   X¯   is in the   90%   confidence interval about the mean of   Zμ1,.3 .

We now wish to see that the set of   μ   values such that   X¯   is in the   90%   confidence interval about the mean of   Zμ,.3   is indeed an interval.   To this end, the   90%   confidence interval about the mean of   Zμ,.3   is given by   (μ.31.645,μ+.31.645)=(μ.492,μ+.492) .   Thus   X¯   is in this   90%   confidence interval precisely when   X¯(μ.492,μ+.492) , or   X¯μ(.492,.492) .   This can be expressed as   μX¯(.492,.492)   or   μ(X¯.492,X¯+.492) .   This is the desired interval.

We thus say that the actual mean   μ1   is in the   90%   confidence interval for the mean   μ , given   X¯ , if   μ1(X¯.492,X¯+.492) .

This process clearly generalizes to arbitrary confidence ( <100% ) and arbitrary sample size.

The General Case

We are given sample mean   X¯   for a sample of size   n   of a   Zμ,σ   process.   The mean   μ   is unknown but the standard deviation   σ   is known.   We will say that   μ   is in the   100p%   confidence interval for the mean if   X¯   is in the   100p%   confidence interval about the mean of   Zμ,σn .   That is, if   X¯(μα,μ+α)   where   P(Zμ,σn(μα,μ+α))=p .   Our task is to determine   α   here.   This is not difficult.   The condition holds precisely when   P(Z0,σn(α,α))=p , or
P(Z0,1(αnσ,αnσ))=p.

That is, for   p(0,1) , we find   α   so that
P(Z0,1(αnσ,αnσ))=p.
and know that   (μα,μ+α)   is the   100p%   confidence interval for the mean of   Zμ,σn .   Now   X¯(μα,μ+α)   precisely when   X¯μ(α,α) , or   μ(X¯α,X¯+α) .

Here is an example illustrating how this works.

Example:   A normal process   Zμ,σ   is known to have standard deviation   σ=5 .   A sample of size   n=250   yields sample mean   X¯=146 .   Find the   95%   confidence interval about the mean for   μ .

According to our above prescription, we need to find   α   such that
P(Z0,1(α2505,α2505))=.95.
But we know that   P(Z0,1(1.96,1.96))=.95 .   Thus we must have
α2505=1.96,
or
α=0.62.
Thus our   95%   confidence interval for   μ   is (1460.62,146+0.62)=(145.38,146.62) .

The determination of the value   0.62   in the preceding example has been packaged into most common spreadsheet applications with a command like “=CONFIDENCE(.05, 5, 250)” , where

  1. the   .05   represents   1.95   with   .95   our confidence,
  2. the   5   being the known standard deviation of the process, and
  3. the   250   being the size of the sample.

You can experiment with this in the spreadsheet below.

It is worthwhile (as will be seen in the discussion below) to understand how the workings of this command can be managed using more basic commands.   In particular, if we have a   95%   confidence interval about the mean of a normal distribution, then such a process will fall out of the mean   5%   of the time.   By symmetry, 2.5%   of the time the process will give a value below the confidence interval, and   2.5% of the time above.   We can find the value   α   such that   P(Z0,1<α)=.025   using the command “=NORMSINV(1-0.25)” or “=NORMSINV(.975)” and then calculate our desired confidence interval.   This is also illustrated in the spreadsheet below.

Click here to open a copy of this so you can experiment with it. You will need to be signed in to a Google account.

Similar considerations allow us to define one-sided confidence intervals for the mean of a normal random variable, given the value of the mean of a sample of size   n .

One-sided Confidence Intervals for the Mean

The sample mean   X¯   of a sample of size   n   of a normal process   Zμ,σ   sits within the   90%   confidence interval to the left if X¯(,μ+α) , where
P(Zμ,σn<μ+α)=.10.
Thus   X¯μ<α , or   μ>X¯α .   So we say   μ   is in the   90%   left-confidence interval for a sample of size   n   of a normal process with standard deviation   σ   if   μ>X¯α , where   α   satisfies
P(Zμ,σn<μ+α)=.10.
But this is
P(Z0,1<αnσ)=.10,
so we have a way to determine   α   from the standard normal distribution.

Clearly we can replace the   90%   confidence by any figure between   0   and   1   that we wish.

Example:   Find the   99% left-confidence interval for the true mean of a normal process with standard deviation   15, given a sample of size   400   with sample mean   88 .

blah

In a similar fashion we can find right-confidence intervals for sample of given size of a normal process with known standard deviation, given the sample mean.