Moments and the Moment Generating Function

In practice, the computation of expected values for certain particular functions of random variables have arisen repeatedly.   These are the power functions.   Thus, for a random variable   $X$   with probability density   $p\,$ , the expected values of   $x\,$ , $\,\displaystyle{ x^2 }\,$ , $\,\displaystyle{ x^3 }\,$ , … , have routinely proven useful.   These are given special names.

Definition: The   $\displaystyle{ k^{\rm th} }$   moment about the origin of random variable   $X$   with density   $p$   is given by
$$
\mu’_k =E\left[ X^k \right] =\left\{ \begin{array}{ccc}
\sum_{i=1}^\infty \, x_i^k \, p(x_i) & \qquad & X\quad \text{discrete} \\ & & \\
\int_{-\infty}^\infty \, x^k\, p(x)\, {\rm d}x & \qquad & X\quad \text{continuous}
\end{array} \right.
$$
It is common to just refer to the density function when performing such computations, rather than to a random variable having the given density function.   Thus we might refer to the   $\displaystyle{ k^{\rm th} }$   moment of   $p\;$ .

The first moment   $\mu’_1$   (this is   $E[X]\,$ , the mean of   $X\,$ ) arises so frequently that it is given a special symbol: $\,\mu\;$ .   Also, the moments about the mean are usually what are really of issue (and why the wording “about the origin” was used above), and we define these by

Definition: The   $\displaystyle{ k^{\rm th} }$   moment about the mean of random variable   $X$   with density   $p$   is given by
$$
\mu_k =E\left[ (X-\mu)^k \right] =\left\{ \begin{array}{ccc}
\sum_{i=1}^\infty \, (x_i-\mu)^k \, p(x_i) & \qquad & X\quad \text{discrete} \\ & & \\
\int_{-\infty}^\infty \, (x-\mu)^k\, p(x)\, {\rm d}x & \qquad & X\quad \text{continuous}
\end{array} \right.
$$

The geometric meanings of the   $\mu_k$   are the same as was discussed in our description
of data.   (See chapter 1, section 3 and chapter 1, section 4 in particular.)   They describe aspects of the
way probability is spread about the mean of a density function.   The most important of these is   $\mu_2\,$ , which also occurs often enough and in an important enough role that it is given a special name and symbol.   We call it the variance of the density, and denote it by   $\displaystyle{ \sigma^2 }\;$ .   Its positive square root, $\,\sigma\,$ , is called the standard deviation of the density function.   We use it in place of the variance when we want our measure of concentration of probability about the mean to be in the same units as our random variable.

A trick for computing   $\mu_2$

The following computational trick is often useful (and will be for us) for computing   $\mu_2\,$ , and in any case gives a simple relationship between   $\mu_2\,$ , $\mu’_2\,$ , and   $\mu\;$ .
$$
\begin{array}{rl}
\mu_2 & =\sum_{i=1}^\infty \, \left(x_i -\mu\right)^2\, p\left( x_i \right) \\ & \\
& = \sum_{i=1}^\infty \, \left(x_i^2 -2\,\mu\, x_i +\mu^2\right)\, p\left( x_i \right) \\ & \\
& = \sum_{i=1}^\infty \, x_i^2\, p\left( x_i \right) +2\,\mu\,\sum_{i=1}^\infty \, x_i\, p\left( x_i \right)
+ \mu^2\,\sum_{i=1}^\infty\, p\left( x_i \right) \\ & \\
& =\mu’_2 -2\,\mu\,\mu +\mu^2 \\ & \\ & =\mu’_2 -\mu^2
\end{array}
$$
Similar such tricks exist for the higher moments, but will not be worked out here.   The reader is invited to find a relationship between   $\mu_3\,$ , $\mu’_3\,$ , $\mu’_2\,$ , and   $\mu\;$ .

By way of illustration   –   and also a display of usefulness   –   consider finding the mean and variance of a linear combination of independent random variables.

Mean and Variance of a Linear Combination of Independent Random Variables

Let   $\displaystyle{ X_1 }\,$ , … , $\displaystyle{ X_n }$   be   $n$   mutually independent random variables, with means   $\displaystyle{ \mu_1 }\,$ , … , $\displaystyle{ \mu_n }$   and variances   $\displaystyle{ \sigma_1 }\,$ , … ,
$\displaystyle{ \sigma_n }\;$ .   Let
$$ W =a_1\, X_1 + \cdots +a_n\, X_n \;\text{.} $$
From linearity we have
$$ \mu_W =E[W] =\sum_{k=1}^n\, E\left[ a_k\, X_k \right] =\sum_{k=1}^n\, a_k\, \mu_k \;\text{.} $$
Now use
$$ W-\mu_W =a_1\, \left( X_1 -\mu_1\right) + \cdots +a_n\, \left( X_n -\mu_n\right) $$
to get
$$
\left( W-\mu_W \right)^2
=\sum_{j=1}^n \sum_{k=1}^n\, a_j\, a_k\, \left( X_j -\mu_j\right) \, \left( X_k -\mu_k\right) \;\text{.}
$$
Applying our properties above, we have
$$
E\left[ \left( W-\mu_W \right)^2 \right]
=\sum_{j=1}^n \sum_{k=1}^n\, a_j\, a_k\, E\left[ \left( X_j -\mu_j\right) \, \left( X_k -\mu_k\right) \right] \;\text{.}
$$
Since the   $\displaystyle{ X_j }$   and   $\displaystyle{ X_k }$   are independent if   $j\neq k\,$ , we have
$$
E\left[ \left( X_j -\mu_j\right) \, \left( X_k -\mu_k\right) \right] =E\left[ X_j -\mu_j \right]\, E\left[ X_k -\mu_k \right]
\qquad (j\neq k) \;\text{.}
$$
But   $\displaystyle{ E\left[ X_j -\mu_j \right] =0 }\,$ , so
$$
E\left[ \left( W-\mu_W \right)^2 \right]
=\sum_{k=1}^n\, a_k^2\, E\left[ \left( X_k -\mu_k\right)^2 \right] \;\text{.}
$$
Since   $\displaystyle{ E\left[ \left( X_k -\mu_k\right)^2 \right] =\sigma_k^2 }\,$ , we have
$$
\sigma_W^2 =\sum_{k=1}^n\, a_k^2\, \sigma_k^2 \;\text{.}
$$

While direct computation of moments of a probability distribution may be rather simple, we present here another method which has enough other uses to justify its existence and our familiarity.

Consider
$$
M_X(t) =E\left[{\rm e}^{t\, X}\right]
=\left\{ \begin{array}{rlc} \sum_{i=1}^\infty \, {\rm e}^{t\, x_i}\, f(x_i) & \qquad & X\quad \text{discrete} \\ & & \\
\int_{-\infty}^\infty \, {\rm e}^{t\, x}\, f(x)\, {\rm d}x & \qquad & X\quad \text{continuous} \end{array} \right.
$$

To see how this computation yields the moments of distribution   $X$ , we need to know that as a power series
$$ {\rm e}^z =1 +z +\frac{z^2}{2!} +\frac{z^3}{3!} +\cdots $$
So that
$$
\begin{array}{rl}
M_X(t) & =\sum_{i=1}^\infty \, \left( 1 +(t\, x_i) +\frac{(t\, x_i)^2}{2!} +\frac{(t\, x_i)^3}{3!} +\cdots \right)\, f(x_i) \\ & \\
& =\sum_{i=1}^\infty \, f(x_i) +t\, \sum_{i=1}^\infty \, x_i\, f(x_i) +\frac{t^2}{2!}\, \sum_{i=1}^\infty \, x_i^2\, f(x_i) +\cdots
& \\ & =1 +t\,{\mu_1}’ +\frac{t^2}{2!}\,{\mu_2}’ +\frac{t^3}{3!}\,{\mu_3}’ +\cdots \;\text{.}
\end{array}
$$
We can now obtain the moments about the mean by
$$ \mu_k’ =\frac{{\rm d}^k\phantom{x}}{{\rm d}x^k}\, M_X(t)\Big|_{t=0} \;\text{.} $$
The utility of this will show up soon.

For now we call   $\displaystyle{ M_{X}(t) }$   the moment generating function of   $X\,$ , and   $\displaystyle{ M_{g(X)}(t) }$   the moment generating function of   $g(X)\;$ .   We will usually refer to this as MGF for short.

Two Properties of MGFs   –   How to Handle Addition of, and Multiplication by, a Constant

Theorem:   For any constant   $c$   and any function   $g(X)$   for which the MGF   $\displaystyle{ M_{g(X)} }$   exists, we have
$$
\begin{array}{rl}
M_{c\, g}(t) & = M_g(c\, t) \\ & \\ M_{g +c}(t) & = {\rm e}^{c\, t}\, M_g(t) \;\text{.}
\end{array}
$$

How to see this

Let   $c$   be a constant, and   $g(X)$   be a function for which   $\displaystyle{ M_{g(X)} }$   exists.   Then
$$ M_{c\,g}(t) =\int_{-\infty}^{\infty}\, {\rm e}^{t\, c\, g(x)}\, f(x)\, {\rm d}x =M_{g}(c\, t) \;\text{.} $$
Secondly
$$
\begin{array}{rl}
M_{g+c}(t) & =\int_{-\infty}^{\infty}\, {\rm e}^{t\, (g(x)+c)}\, f(x)\, {\rm d}x \\ & \\
& ={\rm e}^{c\, t}\, \int_{-\infty}^{\infty}\, {\rm e}^{t\, g(x)}\, f(x)\, {\rm d}x \\ & \\
& ={\rm e}^{c\, t}\, M_{g}(t) \;\text{.}
\end{array}
$$

These two properties will be useful in later computations.   In particular, if a constant multiplies, or is added to, a function of a random variable, these formulae will allow us to easily compute moments and MGFs based on the MGF of our function.

A simple fact is the following:

Theorem If two random variables   $X$   and   $Y$   have same distribution, then they have the same moment generating function, i.e.   $\displaystyle{ M_X(t) = M_Y(t) }$   for all   $t\in\mathbb{R}\;$ .

The above theorem is easy to prove, but an important fact about the moment generating function is that the above result also goes other way round.   And this is what makes the moment generating function such a powerful tool.   We will not give a proof here, but will feel free to use this fact.

Theorem If two random variables   $X$   and   $Y$   have same moment generating function, i.e.   $\displaystyle{ M_X(t) = M_Y(t) }$   for all   $t\in\mathbb{R}\,$ , then they have the same distribution function, i.e. $\displaystyle{ F_X(t) = F_Y(t) }$   for all   $t\in\mathbb{R}\;$ .

We will use moment generating functions, for example, to understand sample means and linear combinations of random variables   &ndash:   particularly when variables generating our samples are independent.   Here is a formula that we will put to work in the case when we wish to determine information about the random variable
$$ g\left( X_1 , \cdots , X_n\right) = a_1\,X_1 +\cdots +a_n\,X_n $$
when   $\displaystyle{ X_1 }\,$ , … , $\,\displaystyle{ X_n }$   are independent.

A Useful Formula: The Moment Generating Function for a Linear Combination of Independent Random Variables

Let   $\displaystyle{ W =a_1\,X_1 +\cdots +a_n\,X_n }$   with   $\displaystyle{ X_1 }\,$ , … , $\,\displaystyle{ X_n }$   independent.   The moment generating function of   $W$   is given by
$$
M_W(t) =E\left[ {\rm e}^{t\,\left( a_1\,X_1 +\cdots +a_n\,X_n \right)} \right]
=E\left[ {\rm e}^{t\,a_1\,X_1} \cdots {\rm e}^{t\,a_n\,X_n} \right] \;\text{.}
$$
The independence of the   $\displaystyle{ X_k }\,$ , and hence of these exponential functions, implies that
$$
M_W(t) =E\left[ {\rm e}^{t\,a_1\,X_1} \right] \cdots E\left[ {\rm e}^{t\,a_n\,X_n} \right]
=M_{a_1\,X_1}(t) \cdots M_{a_n\,X_n}(t) \;\text{.}
$$
That is
$$
M_{a_1\,X_1 +\cdots +a_n\,X_n}(t) =M_{a_1\,X_1}(t) \cdots M_{a_n\,X_n}(t) \;\text{.}
$$