Distribution of the Mean from a Normal Distribution – Introduction to Statistics via Spreadsheets

Here we will consider sampling a normal distribution $n$ mutually independent times, and consider the density function for mean of the sample. That is, let $X$ be a normal random variable with density function $\displaystyle{ Z_{\mu,\sigma} }\,$ , and let $\displaystyle{ X_1 }\, $ , $\,\displaystyle{ X_2 }\, $ , … , $\,\displaystyle{ X_n }$ be $n$ mutually independent samples of $X\;$ . Then
$$ \bar{X} =\textstyle{\frac{1}{n}}\left( X_1 + \cdots + X_n \right) $$
will be the random variable denoting the mean of the sample.

For the purposes of determining the density function of $\bar{X}\,$ , we will first determine its moment generating function.

We already know how to compute the moment generating function for a linear combination of independent random variables. That is, we have seen that if $\displaystyle{ X_1 }\, $ , … , $\,\displaystyle{ X_n }$ are mutually independent, then
$$
M_{a_1\, X_1 +\cdots +a_n\, X_n}(t) =M_{a_1\, X_1}(t) \cdot M_{a_2\, X_2}(t) \cdot \, \cdots\, \cdot M_{a_n\, X_n}(t)
=M_{X_1}(a_1\, t) \cdot M_{X_3}(a_2\, t)\cdot \, \cdots \, \cdot M_{X_n}(a_n\, t) \;\text{.}
$$
Thus
$$
M_{\bar{X}}(t) =M_{\frac{1}{n}X_1}(t)\cdot M_{\frac{1}{n}X_2}(t)\cdot\,\cdots\,\cdot M_{\frac{1}{n}X_n}(t)
=M_{X_1}\left(\frac{t}{n}\right)\cdot M_{X_2}\left(\frac{t}{n}\right)\cdot\,\cdots\,\cdot M_{X_n}
\left(\frac{t}{n}\right) \;\text{.}
$$
Since the random variables $\displaystyle{ X_1 }\, $ , … , $\,\displaystyle{ X_n }$ all share the same density function, this gives
$$ M_{\bar{X}}(t) =M_{X}\left( \frac{t}{n}\right)^n \;\text{.} $$

We have already determined the moment generating function for the normal random variable $X$ with density $\displaystyle{ Z_{\mu,\sigma} }$ to be
$$ M_X(t) ={\rm e}^{\mu\, t +\frac{1}{2}\,\sigma^2\, t^2} \;\text{.} $$
This allows us to say that
$$
M_{\bar{X}}(t) =\left( {\rm e}^{\mu\frac{t}{n} +\frac{1}{2}\,\sigma^2\left(\frac{t}{n}\right)^2 } \right)^n ={\rm e}^{\mu t +\frac{1}{2}\,\left(\frac{\sigma^2}{n}\right)\, t^2 } \;\text{.}
$$

The special thing about this is that the resulting moment generating function looks a lot like that of $\displaystyle{ Z_{\mu,\sigma} }\;$ . In fact, it is precisely the moment generating function of $\displaystyle{ Z_{\mu,\frac{\sigma}{\sqrt{n}}} }\;$ . We have already noted that the moment generating function for a random variable completely determines the probability density function of that random variable. Consequently $\bar{X}$ has density $\displaystyle{ Z_{\mu,\frac{\sigma}{\sqrt{n}}} }\;$ . That is, $\,\bar{X}$ is also a normal random variable, with the same mean as $X\,$ , but a smaller standard deviation – i.e. $\displaystyle{ \frac{\sigma}{\sqrt{n}} }$ rather than $\sigma\;$ .

Here are graphs of the density functions for $\displaystyle{ Z_{1,2} }\,$ , $\displaystyle{ Z_{0,1} }\,$ , and $\displaystyle{ Z_{-1,\frac{1}{2}} }\;$ . Notice how the density function with smaller standard deviation has graph much more concentrated about the mean.

increase in st dev makes graph density function for normal random variable wider and shorter — as σ decreases the graphs compress horizontally and stretch vertically

The above observation regarding the average of a sample from a normal random variable can be seen as a generalization of an easier fact.

Sum of Independent Normal Random Variables

Fact: Let $\displaystyle{ X_1 }$ and $\displaystyle{ X_2 }$ be two independent random variables, with means $\displaystyle{ \mu_1 }$ and $\displaystyle{ \mu_2 }\,$ , and variances $\displaystyle{ \sigma_1^2 }$ and $\displaystyle{ \sigma_2^2 }\,$ , respectively. Then
$$ X_1 + X_2 $$
is a normal random variable, with mean $\displaystyle{ \mu_1 +\mu_2 }$ and variance $\displaystyle{ \sigma_1^2 +\sigma_2^2 }\;$ .

This can be seen by following the logic above. The moment generating functions for $\displaystyle{ X_1 }$ and $\displaystyle{ X_2 }$ are
$$
M_{X_1} ={\rm e}^{\mu_1 t +\frac{1}{2}\sigma_1^2 t^2} \qquad \text{and} \qquad
M_{X_2} ={\rm e}^{\mu_2 t +\frac{1}{2}\sigma_2^2 t^2} \,\text{,}
$$
respectively. Thus
$$
\begin{array}{rl}
M_{X_1 +X_2} & =M_{X_1} M_{X_2} \\ & \\
& ={\rm e}^{\mu_1 t +\frac{1}{2}\sigma_1^2 t^2} {\rm e}^{\mu_2 t +\frac{1}{2}\sigma_2^2 t^2} \\ & \\
& ={\rm e}^{\left(\mu_1 +\mu_2\right) t +\frac{1}{2} \left(\sigma_1^2 +\sigma_2^2\right) t^2}
\end{array}
$$
is the moment generating function for $\displaystyle{ Z_{\mu_1 +\mu_2 , \sqrt{\sigma_1^2 +\sigma_2^2}} }\;$ . That is, $\displaystyle{ X_1 +X_2 }$ is a normal random variable with mean $\displaystyle{ \mu_1 +\mu_2 }$ and variance $\displaystyle{ \sigma_1^2 +\sigma_2^2 }\;$ .

Here are a couple of computational examples.

Example: A standard normal random variable, $\,X$ (with density $\displaystyle{ Z_{0,1} }\,$ ), takes values in the interval $[-1.96, 1.96]\,$ , symmetric about its mean, with $95\%$ probability. If $\bar{X}$ is the sample mean of twenty-five samples of $X\,$ , find the interval symmetric about its mean within which $\bar{X}$ takes $95\%$ of its values.

Since $X$ and $\bar{X}$ both have the same mean, $\,\displaystyle{ \mu_X =\mu_{\bar{X}} =0 }\,$ , $\,\bar{X}$ has density function $\,\displaystyle{ Z_{0,\frac{1}{\sqrt{25}}} =Z_{0,0.2} }\;$ . This takes $95\%$ of its values in the symmetric interval $\left[ -\frac{1.96}{5}, \frac{1.96}{5} \right] =[-.392, .392]\;$ .

Example: If $X$ is standard normal random variable, how large must a sample be to ensure that $\bar{X}$ takes $95\%$ of its values in $[-.15,.15]\;$ ?

For a sample of size $n\,$ , $\bar{X}$ will take $95\%$ of its values in $\left[ -\frac{1.96}{\sqrt{n}}, \frac{1.96}{\sqrt{n}} \right]\;$ . Thus $[-.15,.15]$ will hold at least $95\%$ of its values if
$$ \frac{1.96}{\sqrt{n}} \le .15 \;\text{.} $$
That is,
$$ \frac{1.96}{.15} \le \sqrt{n} \,\text{,} $$
or
$$ \left(\frac{1.96}{.15}\right)^2 =170.74 \le n \;\text{.} $$
That is, we need a sample size of greater than $170$ to ensure that our sample mean will be in $[-.15,.15]$ with probability at least $.95\;$ .

Here are a couple of ways that this might be used.

Example: Experience has shown that the melting point of a certain plastic is described by a normal distribution with mean $85C$ and standard deviation $4C\;$ . Twenty samples from a new production process are tested and found to have an average melting point of $82.2C\;$ . What is the probability that a sample of this size from the original process will have average melting point less than $82.5C\;$ ? Does the new process yield product as stable as the original process temperatures near to $80C\;$ ?

We note that
$$
\begin{array}{rl}
P\left( X_{85,\frac{4}{\sqrt{20}}} \lt 82.5\right) & =P\left(X_{0,\frac{2}{\sqrt{5}}} \lt -2.5\right) \\ & \\
& =P\left( X_{0,1} \lt -2.5\cdot \frac{2}{\sqrt{5}} \right) \\ & \\
& =0.013 \;\text{.}
\end{array}
$$
That is, there is less than a $1.5\%$ chance that a sample of the same size would exhibit such a low average melting point.

A reasonable conclusion is that the new process does not yield a plastic which is as stable as the old, at least when temperatures get above $80C\;$ .

Example: In the preceding example, the mean melting point of plastics produced by the new process needs to be determined to within $.1C$ with $99\%$ accuracy. Assuming that the standard deviation for the new process is $4C\,$ , just as for the old process, how large of a sample is needed to obtain such accuracy?

We address the following mathematical question: for how large of an $n$ will $\bar{X}$ take $99\%$ of its values in $\left[ \mu -.1, \mu +.1 \right]\;$ ? Since $\bar{X}$ has density $\displaystyle{ Z_{\mu,\frac{4}{\sqrt{n}}} }\,$ , this is asking how large to make $n$ so that
$$ P\left( \mu -.1 \lt Z_{\mu,\frac{4}{\sqrt{n}}} \lt \mu +.1 \right) \ge .99 \;\text{.} $$
That is,
$$ P\left( -.1 \lt Z_{0,\frac{4}{\sqrt{n}}} \lt .1 \right) \ge .99 \,\text{,} $$
or
$$ P\left( -\frac{\sqrt{n}}{40} \lt Z_{0,1} \lt \frac{\sqrt{n}}{40} \right) \ge .99 \;\text{.} $$
Since $ P\left( -2.58 \lt Z_{0,1} \lt 2.58 \right) \ge .99 \,$ , we need
$$ \frac{\sqrt{n}}{40} \ge 2.58 \,\text{,} $$
or
$$ n\ge ( 40\cdot 2.58)^2 =10650.24 \;\text{.} $$
Thus we need a sample size of greater than $10650$ to ensure that our sample mean will be within $.1C$ of its actual mean.

These examples should be compared with those encountered when we looked at Chebyshev’s theorem. In both cases, more control over the standard deviation of a probability density gives us tighter spread about the mean.