Let $\displaystyle{ V_1 }$ and $\displaystyle{ V_2 }$ be two independent $\displaystyle{ \chi^2 }$ random variables with $\displaystyle{ n_1 }$ and $\displaystyle{ n_2 }$ degrees of freedom, respectively. Consider the problem of finding the distribution of their sum $\displaystyle{ V =V_1 +V_2 }\;$ . As the two are independent, the moment generating function of $V$ is the product of the MGFs of $\displaystyle{ V_1 }$ and $\displaystyle{ V_2 }$ . That is

$$ M_V(t) =M_{V_1}(t)\cdot M_{V_2}(t) \;\text{.} $$

But we know the MGF of a $\displaystyle{ \chi^2 }$ random variable:

$$

M_{V_1}(t) =(1-2\, t)^{-\frac{n_1}{2}} \quad\text{and}\quad M_{V_2}(t) =(1-2\, t)^{-\frac{n_2}{2}} \;\text{.}

$$

So

$$

M_V(t) =(1-2\, t)^{-\frac{n_1}{2}} \cdot (1-2\, t)^{-\frac{n_2}{2}} =(1-2\, t)^{-\frac{(n_1 +n_2)}{2}} \;\text{.}

$$

That is, $\, V$ is a $\displaystyle{ \chi^2 }$ random variable with $\displaystyle{ n_1 +n_2 }$ degrees of freedom.

We use this theorem to find the distribution of the sum of squares of a set of independent standard normal random variables. To this end, let $Z$ be a standard normal variable. Then the MGF of $\displaystyle{ Z^2 }$ is

$$

\begin{array}{rl}

M_{Z^2}(t) & =\int_{-\infty}^{\infty}\, {\rm e}^{t\, x^2}\,\frac{{\rm e}^{-\left( \frac{x^2}{2} \right)}}{\sqrt{2\,\pi}}\, {\rm d}x \\

& =\frac{1}{\sqrt{2\,\pi}} \,\int_{-\infty}^{\infty}\, {\rm e}^{-\left( \frac{x^2}{2} \right)\,(1-2\, t)}\, {\rm d}x \;\text{.}

\end{array}

$$

Letting $y =x\,\sqrt{1-2\,t}\,$ , this integral reduces to

$$

\begin{array}{rl}

M_{Z^2}(t) & =(1-2\, t)^{-\frac{1}{2}}\, \int_{-\infty}^{\infty}\, \frac{{\rm e}^{-\left( \frac{y^2}{2} \right)}}{\sqrt{2\,\pi}}\, {\rm d}y \\

& =(1-2\, t)^{-\frac{1}{2}} \;\text{.}

\end{array}

$$

This is the MGF of a $\displaystyle{ \chi^2 }$ variable with one degree of freedom. Thus, using the above theorem, we have that the sum of squares of a set of $n$ independent standard normal variables will be a $\displaystyle{ \chi^2 }$ variable with $n$ degrees of freedom. We will now use this in considering several problems relating to the variance of a normal distribution.

## The Distribution of $\displaystyle{ \sum_{k=1}^n\, \left( X_k -\mu\right)^2 }$

If $X$ is a normal random variable with mean $\mu$ and variance $\sigma^2\,$ , consider a random sample $X_1\,$ , … , $\,X_n$ of size $n\;$ . Then $\displaystyle{ Z_1 =\frac{X_1 -\mu}{\sigma}}$ , … , $\displaystyle{ Z_n =\frac{X_n -\mu}{\sigma}}$ are independent standard normal variables. It follows that

$$ V =\sum_{k=1}^n\, \left( \frac{X_n -\mu}{\sigma} \right)^2 =\sum_{k=1}^n\, Z_k^2 $$

is a $\displaystyle{ \chi^2 }$ variable with $n$ degrees of freedom.

We can use this to address various problems concerning the parameter $\sigma$ of a normal random variable.

Note that the quantity $ \sum_{k=1}^n\, \left( X_n -\mu\right)^2\big/ n $ looks a lot like the sample variance, and thus it will be useful for estimating $\sigma^2$ when $\mu$ is known. If $\mu$ is not known, we will need to replace it by $\bar{X}\;$ . The resulting random variable

$$ V =\sum_{k=1}^n\, \left( \frac{X_n -\bar{X}}{\sigma} \right)^2 =\sum_{k=1}^n\, Z_k^2 $$

would naturally be expected to be approximately a $\displaystyle{ \chi^2 }$ random variable with $n$ degrees of freedom. And we would expect that as $n$ gets larger the approximation improves. The actuality is that $V$ is a $\displaystyle{ \chi^2 }$ random variable, but with $n-1$ degrees of freedom. (A fact that will be justified later. Thus we will use

$$ \sum_{k=1}^n\, \left( \frac{X_n -\mu}{\sigma} \right)^2 $$

when $\mu$ is available, and

$$ \sum_{k=1}^n\, \left( \frac{X_n -\bar{X}}{\sigma} \right)^2 $$

when it’s not.

### Example 1

Suppose that $X$ is normal with mean $12$ and unknown variance $\sigma^2\;$ . Consider the problem of estimating this variance using a random sample of size $25\;$ . The quantity

$$ \sum_{k=1}^{25}\, \left( \frac{X_n -12}{25} \right)^2 $$

will be used as to estimate $\sigma^2\;$ . What is the probability that this will not be off by more than $10\%\;$ ?

Our accuracy requirement can be expressed as

$$

.9\,\sigma^2 \lt \sum_{k=1}^{25}\, \left( \frac{X_n -12}{25} \right)^2 \lt 1.1\,\sigma^2 \;\text{.}

$$

This is equivalent to

$$ 22.5 \lt \sum_{k=1}^{25}\, \left( \frac{X_n -12}{\sigma^2} \right)^2 \lt 27.5 \;\text{.} $$

In terms of our discussion, this is

$$ 22.5 \lt V \lt 27.5 \,\text{,} $$

where $V$ is a $\displaystyle{ \chi^2 }$ random variable with $25$ degrees of freedom. Now $\displaystyle{ \chi^2 }$ densities and are distributions cannot be known in simple combinations of classical functions. They can only be approximated. Knowing this, spreadsheets have been designed with approximate values of $\displaystyle{ \chi^2 }$ densities and distributions built in. We can obtain the probability that $V$ is between $22.5$ and $27.5$ using a spreadsheet. The command “=CHISQDIST(22.5,25,1)” gives that the probability of a $\displaystyle{ \chi^2 }$ random variable with $25$ degrees of freedom being at most $22.5$ is $0.3933\, $ , and similarly the probability of being less than $27.5$ is $0.6686\; $ . Thus the probability that such a variable takes values between $22.5$ and $27.5$ is $0.6686 -0.3933 =0.2754\; $ . This is the probability that our estimate of $\sigma^2$ is off by less than $10\%\;$ .

### Example 2

Suppose now that we do not know the actual mean of our normal variable, and must use the sample mean. That is, we use

$$ \sum_{k=1}^n\, \left( \frac{X_n -\bar{X}}{\sigma} \right)^2 $$

to address problems concerning $\sigma^2\;$ .

By way of illustration, suppose that a sample of size $20$ is taken of a normal variable $X\;$ . What is the probability that the sample variance will be $25\%$ larger than $\sigma^2\;$ ? We represent this as the probability that

$$ \sum_{k=1}^{20}\, \frac{\left( X_k -\bar{X} \right)^2}{19} \gt 1.25\,\sigma^2 \;\text{.} $$

This is equivalent to

$$ \sum_{k=1}^{20}\, \frac{\left( X_k -\bar{X} \right)^2}{\sigma^2} \gt 23.75\,\sigma^2 \;\text{.} $$

As noted above, the left-hand side is a $\displaystyle{ \chi^2 }$ random variable with $19$ degrees of freedom. Again using a spreadsheet we find

$$ \sum_{k=1}^{20}\, \frac{\left( X_k -\bar{X} \right)^2}{\sigma^2} \le 23.75\,\sigma^2 $$

to be $0.7941$ via the command “=CHISQDIST(23.75,19,1)” . Thus the desired probability is $1 -0.7941 =0.2059\;$ . That is, there is an approximately $20\%$ chance that our sample variance will exceed the actual by more than $20\%\;$ .

### Example 3

As a different type of example, consider the problem of trying to deliver supplies to a colony on Mars. What is the probability that a delivery will miss the drop-spot by more than a mile, assuming that the north-south and east-west errors are independently normally distributed with common

standard deviation of $.25{\rm mi}\;$ .

For convenience, we will measure in units of quarter miles. Let $N$ denote the north-south displacement of our delivery, and $W$ the east-west displacement. Our choice of units means that $N$ and $W$ are independent standard normal variables. The square of the total displacement is

$$ N^2 +E^2 \,\text{,} $$

and possesses a $\displaystyle{ \chi^2 }$ distribution with $2$ degrees of freedom. Thus the desired probability is

$$ P\left(\sqrt{ N^2 +W^2 } \gt 4\right) =P\left( N^2 +W^2 \gt 16 \right) \;\text{.} $$

We again use a spreadsheet with the command “=CHISQDIST(16,2,1)” to find that the probability that a $\displaystyle{ \chi^2 }$ distribution with $2$ degrees of freedom takes on values no greater than $16$ is $0.9997\;$ . The probability that our delivery misses its drop-spot by more than a mile is thus $1-0.9997 =0.0003\,$ , so that there is about a $0.03\%$ chance of missing the drop-spot by over a mile.

There are other interesting and important application of $\displaystyle{ \chi^2 }$ random variables, and some of these will be introduced later.