Distribution of Averages for Normal Variables – Introduction to Statistics via Spreadsheets

Consider for a moment the average $\overline{X}$ of $n$ samples of a random variable $X\;$ . We already observed that if $X$ is normally distributed with mean $\mu$ and standard deviation $\sigma$ then
$$ \overline{X} = \frac{1}{n}\, \left( X_1 +X_2 + \dots +X_n \right) $$
is a normal random variable with mean $\mu$ and standard deviation $\displaystyle{ \frac{\sigma}{\sqrt{n}} }\;$ .

As such, the density of $\overline{X}$ is $\displaystyle{ Z_{\mu, \frac{\sigma}{\sqrt{n}}} }$ and
$$ \frac{\overline{X}-\mu}{\frac{\sigma}{\sqrt{n}}} $$
has standard normal density $\displaystyle{ Z_{0,1} }\;$ .

This being the case, we see that
$$
P\left( a \lt Z_{0,1} \lt b \right)
=P\left( a \lt \frac{\overline{X}-\mu}{\frac{\sigma}{\sqrt{n}}} \lt b \right)
=P\left( \mu +a\,\frac{\sigma}{\sqrt{n}} \lt \overline{X} \lt \mu +b\,\frac{\sigma}{\sqrt{n}} \right)
\,\text{,}
$$
or
$$
P\left( \alpha \lt \overline{X} \lt \beta \right)
=P\left( \frac{\alpha-\mu}{\frac{\sigma}{\sqrt{n}}} \lt \frac{\overline{X}-\mu}{\frac{\sigma}{\sqrt{n}}} \lt \frac{\beta-\mu}{\frac{\sigma}{\sqrt{n}}} \right)
=P\left( \frac{\alpha-\mu}{\frac{\sigma}{\sqrt{n}}} \lt Z_{0,1} \lt \frac{\beta-\mu}{\frac{\sigma}{\sqrt{n}}} \right)
\;\text{.}
$$
That is, we can compute the probability that $\overline{X}$ takes values in a specific interval by determining the probability that $\displaystyle{ Z_{0,1} }$ takes values in a related interval.

Symmetric Confidence Intervals

One class of typical useful examples is given by determining intervals in which a normal variable, or an average of $n$ samples of a normal variable, must occur with a certain high probability. Thus, as we know from our earlier discussion of the standard normal density that it has a $90\%$ probability of taking values between $-1.64$ and $1.64$ (or $95\%$ probability of taking values between $-1.96$ and $1.96\,$ , or $99\%$ probability of taking values between $-2.57$ and $2.57\,$ ), we can determine intervals symmetric about its mean $\mu$ within which $\overline{X}$ has a $90\%$ (or $95\%$ , or $99\%\,$ ) probability of taking values.

Example – simple computation

Here we will first find a $90\%$ confidence interval for the mean of a sample of size $80$ of a $\displaystyle{ Z_{23,7} }$ random variable. Then we will sketch the method for finding a symmetric $(100\cdot \pi )\%$ confidence interval for the mean of a sample of size $n$ from a $\displaystyle{ Z_{\mu,\sigma} }$ random variable.

Since the average $\bar{X}$ of a sample of size $n$ of a $\displaystyle{ Z_{23,7} }$ random variable is itself a random variable with mean $23$ and standard deviation $\displaystyle{ \frac{7}{\sqrt{80}} }\,$ , a symmetric $90\%$ confidence interval for $\bar{X}$ is given by
$$ P\left( \left| \frac{\bar{X} -23}{\frac{7}{\sqrt{80}}} \right| \le 1.64 \right) =.90 \;\text{.} $$
We transform this algebraically to
$$ P\left( \left| \bar{X} -23 \right| \le 1.64\cdot \frac{7}{\sqrt{80}} \right) =.90 $$
or
$$
P\left( 23 -1.64\cdot \frac{7}{\sqrt{80}} \le \bar{X} \le 23 +1.64\cdot \frac{7}{\sqrt{80}} \right) =.90 \;\text{.}
$$
That is, since
$$ 1.64\cdot \frac{7}{\sqrt{80}} \doteq 1.28 \,\text{,} $$
the interval
$$ [23-1.28, 23+1.28] =[21.72, 24.28] $$
is an interval, centered at the mean $23\,$ , in which $\bar{X}$ has a $90\%$ chance of occurring. Thus it is our symmetric $90\%$ confidence interval about the mean ( $\, 23\,$ ) for $\bar{X}\;$ .

In general, to find a symmetric $(100\cdot \pi )\%$ confidence interval about the mean for the average of sample of size $n$ from a $\displaystyle{ Z_{\mu,\sigma} }$ random variable, we seek an interval $[\mu -a, \mu +a]$ such that
$$ P( \bar{X} \in [\mu -a, \mu +a] ) =\pi \;\text{.} $$
First we find $b$ such that
$$ P\left( \left| Z_{0,1} \right| \le b \right) =\pi \;\text{.} $$
Once we have this $b$ value (from, for example, the spreadsheet command “NORMSINV((1+ $\,\pi\, $ )/2)” ) we note that $\bar{X}$ is a normal random variable with mean $\mu$ and standard deviation $\sigma\,$ , so that
$$ \frac{\bar{X} -\mu}{\frac{\sigma}{\sqrt{n}}} =Z_{0,1} \;\text{.} $$
Thus
$$
P\left( \left| \frac{\bar{X} -\mu}{\frac{\sigma}{\sqrt{n}}} \right| \le b \right) =\pi \;\text{.} $$
This can be algebraically transformed to
$$ P\left( \left| \bar{X} -\mu \right| \le b\cdot \frac{\sigma}{\sqrt{n}} \right) =\pi $$
or
$$
P\left( \mu -b\cdot \frac{\sigma}{\sqrt{n}} \le \bar{X} \le \mu +b\cdot \frac{\sigma}{\sqrt{n}} \right) =\pi \;\text{.}
$$
That is,
$$ a =b\cdot \frac{\sigma}{\sqrt{n}} $$
and our interval is
$$
\left[ \mu -b\cdot \frac{\sigma}{\sqrt{n}}, \mu +b\cdot \frac{\sigma}{\sqrt{n}} \right] \;\text{.} $$

One-Sided Confidence Intervals

Other typical useful examples of this are given by determining one-sided intervals in which a normal variable, or an average of $n$ samples of a normal variable, must occur with a certain high probability. To this end, we know that a standard normal density has a $90\%$ probability of taking values less than $1.28$ (or $95\%$ probability of taking values less than $1.64\,$ , or $99\%$ probability of taking values less than $2.33\,$ ), we can determine intervals unbounded in one direction within which $\overline{X}$ has a $90\%$ (or $95\%$ , or $99\%\,$ ) probability of taking values.

Example – simple computation to the left

As in the case of symmetric confidence intervals considered above, we will first find a $90\%$ confidence interval to the left for a sample of size $80$ of a $\displaystyle{ Z_{23,7} }$ random variable. Then we will sketch the method for finding a $(100\cdot \pi )\%$ confidence interval to the left for a sample of size $n$ from a $\displaystyle{ Z_{\mu,\sigma} }$ random variable.

Since the average $\bar{X}$ of a sample of size $n$ of a $\displaystyle{ Z_{23,7} }$ random variable is itself a random variable with mean $23$ and standard deviation $\displaystyle{ \frac{7}{\sqrt{80}} }\,$ , a $90\%$
confidence interval to the left for $\bar{X}$ is given by
$$ P\left( \frac{\bar{X} -23}{\frac{7}{\sqrt{80}}} \le 1.28 \right) =.90 \;\text{.} $$
We transform this algebraically to
$$ P\left( \bar{X} -23 \le 1.28\cdot \frac{7}{\sqrt{80}} \right) =.90 $$
or
$$ P\left( \bar{X} \le 23 +1.28\cdot \frac{7}{\sqrt{80}} \right) =.90 \;\text{.} $$
That is, since
$$ 1.28\cdot \frac{7}{\sqrt{80}} \doteq 1.00 \,\text{,} $$
the interval
$$ (-\infty, 23+1.00] =(-\infty, 24.00] $$
is an interval, infinite to the left, in which $\bar{X}$ has a $90\%$ chance of occurring. Thus it is our one-sided to the left $90\%$ confidence interval for $\bar{X}\;$ .

In general, to find a one-sided to the left $(100\cdot \pi )\%$ confidence interval for the average of sample of size $n$ from a $\displaystyle{ Z_{\mu,\sigma} }$ random variable, we seek an interval $(-\infty, a]$ such that
$$ P( \bar{X} \in (-\infty, a] ) =\pi \;\text{.} $$
First we find $b$ such that
$$ P\left( Z_{0,1} \le b \right) =\pi \;\text{.} $$
Once we have this $b$ value (from, for example, the spreadsheet command “NORMSINV( $\,\pi\, $ )” ) we note that $\bar{X}$ is a normal random variable with mean $\mu$ and standard deviation $\sigma\,$ , so that
$$ \frac{\bar{X} -\mu}{\frac{\sigma}{\sqrt{n}}} =Z_{0,1} \;\text{.} $$
Thus
$$ P\left( \frac{\bar{X} -\mu}{\frac{\sigma}{\sqrt{n}}} \le b \right) =\pi \;\text{.} $$
This can be algebraically transformed to
$$ P\left( \bar{X} -\mu \le b\cdot \frac{\sigma}{\sqrt{n}} \right) =\pi $$
or
$$ P\left( \bar{X} \le \mu +b\cdot \frac{\sigma}{\sqrt{n}} \right) =\pi \;\text{.} $$
That is,
$$ a =\mu +b\cdot \frac{\sigma}{\sqrt{n}} $$
and our interval is
$$ \left( -\infty, \mu +b\cdot \frac{\sigma}{\sqrt{n}} \right] \;\text{.} $$

Example – simple computation to the right

To find a $(100\cdot $\,\pi\, $ )\%$ confidence interval to the right for a sample of size $n$ from a $\displaystyle{ Z_{\mu,\sigma} }$ random variable, we proceed in a fashion similar to that for confidence intervals symmetric about the mean, or confidence intervals to the left. Both of these have been considered above. So just the general treatment will be considered here.

In general, to find a one-sided to the right $(100\cdot \pi )\%$ confidence interval for the average of sample of size $n$ from a $\displaystyle{ Z_{\mu,\sigma} }$ random variable, we seek an interval $[a, +\infty)$ such that
$$ P( \bar{X} \in [a, +\infty) ) =\pi \;\text{.} $$
First we find $b$ such that
$$ P\left( Z_{0,1} \ge b \right) =\pi \;\text{.} $$
Once we have this $b$ value (from, for example, the spreadsheet command “NORMSINV(1- $\,\pi\, $ )” ) we note that $\bar{X}$ is a normal random variable with mean $\mu$ and standard deviation $\sigma\,$ , so that
$$ \frac{\bar{X} -\mu}{\frac{\sigma}{\sqrt{n}}} =Z_{0,1} \;\text{.} $$
Thus
$$ P\left( \frac{\bar{X} -\mu}{\frac{\sigma}{\sqrt{n}}} \ge b \right) =\pi \;\text{.} $$
This can be algebraically transformed to
$$ P\left( \bar{X} -\mu \ge b\cdot \frac{\sigma}{\sqrt{n}} \right) =\pi $$
or
$$ P\left( \bar{X} \ge \mu +b\cdot \frac{\sigma}{\sqrt{n}} \right) =\pi \;\text{.} $$
That is,
$$ a =\mu +b\cdot \frac{\sigma}{\sqrt{n}} $$
and our interval is
$$ \left[ \mu +b\cdot \frac{\sigma}{\sqrt{n}}, +\infty \right) \;\text{.} $$

Most spreadsheet applications have these computations built in. For example, the spreadsheet command “CONFIDENCE( $\,\alpha\, $ , $\,\sigma\, $ , $\,n\, $ )” gives the value $a$ such that the probability that the average $\bar{X}$ for a sample of size $n$ from a $\displaystyle{ Z_{0,\sigma} }$ random variable being more than $a$ from $0$ is $\alpha\;$ . That is,
$$ P\left( \big| \bar{X}\big| \gt a\right) =\alpha \;\text{.} $$

It is worth noting that for a fixed probability (or percentage) of confidence interval, as the size of the sample gets larger the width of the interval decreases. Thus a larger sample improves the control over where the average will take its values.

Consider, for example, the situation examined above of the symmetric $90\%$ confidence interval for the mean of a sample of size $80$ of a $\displaystyle{ Z_{23,7} }$ random variable. We saw that the desired interval was $[21.72, 24.28]\; $ . If we increase the sample size to $200\,$ , the same technique gives us the symmetric $90\%$ confidence interval $[22.19, 23.81]\; $ . We see that the larger sample size gives a tighter interval for where $\bar{X}$ is likely to occur.

In fact, we can explicitly control the width of our symmetric confidence intervals by controlling sample size. This has the practical consequence that if an experiment demands a certain degree of accuracy for the mean of a process, it can be determined how large of a sample is needed for the desired accuracy.

Example: Control Size of Confidence Interval by Controlling Sample Size

Suppose we wish to obtain a large enough sample of a $\displaystyle{ Z_{23,7} }$ normal random variable so that our symmetric about the mean $90\%$ confidence interval has width less than $0.2\;$ . That is, we ask that with $0.90$ probability the average of a sample of size $n$ is in $[22.9, 23.1]\;$ . how large of a sample do we need?

Since the symmetric $90\%$ confidence interval for a sample of size $n$
is
$$ \left[ 23-1.64\cdot \frac{7}{\sqrt{n}} , 23+1.64\cdot \frac{7}{\sqrt{n}} \right] \;\text{.} $$
This is of width less than $0.2$ precisely when
$$ 1.64\cdot \frac{7}{\sqrt{n}} \lt 0.1 \,\text{,} $$
or
$$ 1.64\cdot \frac{7}{0.1} \lt \sqrt{n} \;\text{.} $$
That is, we need $13,179.04 \lt n\;$ . We see that a sample of size greater than $13,200$ will guarantee that our symmetric $90\%$ confidence interval is sufficiently small.

In the following several examples, we illustrate how such confidence intervals might be employed.

Example: The producer of a cable has found through repeated testing that the tensile strength of one of their products is normally distributed with mean $420$ pounds and standard deviation $60$ pounds. A change in the manufacturing process is tested and thirty-six samples are tested. If the new process has no effect on the tensile strength of the cable, what is the probability that the average tensile strength for the forty samples is less than $400$ pounds?

Solution: The average $\bar{X}$ is normally distributed, has mean
$$\mu =420 \,\text{lb,}$$
and standard deviation
$$\frac{60}{\sqrt{36}} =10\,\text{lb.}$$
Thus its distribution is $Z_{420,10}\,$ , and
$$\frac{\bar{X}-420}{10}$$
has a standard normal distribution. The probability that the average will be less than $400$ is the probability that
$$\frac{(\text{average}-420)}{10} \lt \frac{400-420}{10} =-2 \;\text{.}$$
The probability that a standard normal variable takes on value $\lt -2$ is $0.02275\;$ . Thus there is less than a three percent chance that our average tensile strength over the thirty-six samples will be less than $400\;$ .

Example: If in fact the new process mentioned in the previous example changed the tensile strength of the cable, the manufacturer needs to know the new tensile strength, at least to within five pounds. Assuming that the change in process did not change the standard deviation, how large of a sample should be taken to be $95\%$ certain that the estimated mean is accurate to within five pounds?

Solution: We’ve assumed that the new process is normal with mean $\mu =420$ and standard deviation $\sigma =60\; $ . For the average over a sample of size $n$ to be accurate to within five pounds, we need
$$ \big| \bar{X} -\mu \big| \lt 5 \;\text{.} $$
Since a symmetric $95\%$ confidence interval for a sample of size $n$ is given by
$$
\left[ \mu -1.96 \cdot \frac{60}{\sqrt{n}} , \mu +1.96 \cdot \frac{60}{\sqrt{n}} \right] \,\text{,} $$
we will have that $\bar{X}$ is in this interval, or
$$ \big| \bar{X} -\mu \big| \le 1.96 \cdot \frac{60}{\sqrt{n}} $$
with $95\%$ certainty. Thus we will be $95\%$ certain that $\big| \bar{X} -\mu \big| \lt 5$ if
$$ 1.96 \cdot \frac{60}{\sqrt{n}} \lt 5 \;\text{.} $$
But this is
$$ 1.96 \cdot 12 \lt \sqrt{n} \,\text{,} $$
or
$$ 553.19 \lt n \;\text{.} $$
That is, if we take a sample of size at least about $560$ we will be $95\%$ certain that the average of the sample will be accurate to within five pounds.