Further Properties of Expectation and Moment Generating Functions – Introduction to Statistics via Spreadsheets

Recall that expectation $E$ is linear. That is, if $\displaystyle{ g_1 }$ and $\displaystyle{ g_2 }$ are functions of random variables for which expected values exist, and $\displaystyle{ c_1 }$ and $\displaystyle{ c_2 }$ are arbitrary constants, then
$$ E\left[ c_1\, g_1 +c_2\, g_2\right] =c_1\,E\left[ g_1\right] +c_2\,E\left[ g_2\right] \;\text{.} $$
It turns out that if $\displaystyle{ g_1 }$ and $\displaystyle{ g_2 }$ are independent, so that that the joint density function can be factored into the product of two individual density functions, then
$$ E\left[g_1\, g_2\right] =E\left[ g_1\right]\, E\left[ g_2\right] \;\text{.} $$

This new property is more subtle than linearity, and can be seen – in the case of continuous random variables – as follows.
$$
E\left[g_1\, g_2\right]
=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}\, g_1\, g_2\, {\rm d}f_{X_1,X_2}
=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}\, g_1\,g_2\, {\rm d}f_{X_1}\, {\rm d}f_{X_2}
=\int_{-\infty}^{\infty}\, g_1\, {\rm d}f_{X_1}\cdot \int_{-\infty}^{\infty}\, g_2\, {\rm d}f_{X_2}
=E\left[ g_1\right] \, E\left[ g_2\right]
\;\text{.}
$$

These observations extend to arbitrary finite collections of random variables.

Conclusion: The expected value of a linear combination of random variables is the linear combination of their expected values, whether or not the variables are independent. But the expected value of a product may or may not be the product of the expected values. We only claim that this is true if the variables are independent.

By way of illustration – and also a display of usefulness – consider finding the mean and variance of a linear combination of a set of independent random variables.

Let $\displaystyle{ X_1 }\,$ , … , $\displaystyle{ X_n }$ be $n$ mutually independent random variables, with means $\displaystyle{ \mu_1 }\,$ , … , $\displaystyle{ \mu_n }$ and variances $\displaystyle{ \sigma_1 }\,$ , … , $\displaystyle{ \sigma_n }\;$ . Let
$$ W =a_1\, X_1 + \cdots +a_n\, X_n \;\text{.} $$
From linearity we have
$$ \mu_W =E[W] =\sum_{k=1}^n\, E\left[ a_k\, X_k \right] =\sum_{k=1}^n\, a_k\, \mu_k \;\text{.} $$
Now use
$$ W-\mu_W =a_1\, \left( X_1 -\mu_1\right) + \cdots +a_n\, \left( X_n -\mu_n\right) $$
to get
$$
\left( W-\mu_W \right)^2
=\sum_{j=1}^n \sum_{k=1}^n\, a_j\, a_k\, \left( X_j -\mu_j\right) \, \left( X_k -\mu_k\right) \;\text{.}
$$
Applying our properties above, we have
$$
E\left[ \left( W-\mu_W \right)^2 \right]
=\sum_{j=1}^n \sum_{k=1}^n\, a_j\, a_k\, E\left[ \left( X_j -\mu_j\right) \, \left( X_k -\mu_k\right) \right] \;\text{.}
$$
Since the $\displaystyle{ X_j }$ and $\displaystyle{ X_k }$ are independent if $j\neq k\,$ , we have
$$
E\left[ \left( X_j -\mu_j\right) \, \left( X_k -\mu_k\right) \right] =E\left[ X_j -\mu_j \right]\, E\left[ X_k -\mu_k \right]
\qquad (j\neq k) \;\text{.}
$$
But $\displaystyle{ E\left[ X_j -\mu_j \right] =0 }\,$ , so
$$
E\left[ \left( W-\mu_W \right)^2 \right]
=\sum_{k=1}^n\, a_k^2\, E\left[ \left( X_k -\mu_k\right)^2 \right] \;\text{.}
$$
Since $\displaystyle{ E\left[ \left( X_k -\mu_k\right)^2 \right] =\sigma_k^2 }\,$ , we have
$$
\sigma_W^2 =\sum_{k=1}^n\, a_k^2\, \sigma_k^2 \;\text{.}
$$

We will use these formulae now to solve a problem. We will also use these routinely in later discussion.

Optimal Weighting

Suppose we have several different experiments of varying accuracy measuring a specific quantity. The question we wish to consider here is how we might combine the data from these different experiments to obtain a good estimate of the quantity being measured.

A simple method would be to simply average the sample values, obtaining an average value for the quantity. But another, perhaps more natural, idea would be to somehow give the more accurate method more importance, more weight. What we present here is a use of multivariable calculus to address how we can average several measurements of various accuracy in a way that maximizes the accuracy of the average.

To this end, let $X_1\,$ , $\, X_2\,$ , $\,\cdots\,$ , $\,X_n$ be measurements of our specific quantity. Since the quantity has a specific value, we will presume that the means of random variables $X_1\,$ , $\, X_2\,$ , $\,\cdots\,$ , $\,X_n$ are all equal, and the random variables are independent. If the variances of these random variables are $\sigma^2_1\,$ , $\, \sigma^2_2\,$ , $\,\cdots\,$ , $\,\sigma^2_n$ and we take a simple average of the $n$ measurements, we have
$$ \bar{X} =\frac{1}{n}\, X_1 +\frac{1}{n}\, X_2 + \cdots \frac{1}{n}\, X_n $$
is our simple average, and has (by virtue of the preceding discussion) variance
$$ \bar{\sigma}^2 =\frac{1}{n^2}\,\sum_{k=1}^n\, \sigma^2_k \;\text{.} $$

What we wish to examine is whether or not this variance can be decreased by taking a different, weighted, average. That is, consider
$$ W =a_1\, X_1 +a_2\, X_2 + \cdots a_n\, X_n $$
with all of the weights $a_k$ positive, and
$$ \sum_{k=1}^n\, a_k =1 \,\text{,}$$
as a general weighted average of the $X_k\;$ . How can we pick these weights to minimize
the variance of $W\;$ ? This minimum will necessarily be no greater than that of
$\bar{X}$ above.

Again, by or assumption of independence, we have
$$ \sigma^2_W =\sum_{k=1}^n\, a_k^2\, \sigma^2_k \;\text{.} $$
With
$$ a_n =1-\sum_{k=1}^{n-1}\, a_k \,\text{,} $$
we rewrite
$$ \sigma^2_W =\sum_{k=1}^{n-1}\, a_k^2\, \sigma^2_k +\left( 1-\sum_{k=1}^{n-1}\, a_k \right)^2\,\sigma^2_n \;\text{.} $$
It is this quantity we wish to minimize. And such can be managed using standard tools of multivariable calculus:
$$
\begin{array}{rl}
\frac{\partial\sigma^2_W}{\partial a_k} & = 2\, a_k\, \sigma^2_k -2\, \left( 1-\sum_{k=1}^{n-1}\, a_k \right)\,\sigma^2_n \\
& \\
& =2\, a_k\, \sigma^2_k -2\, a_n\, \sigma^2_n \;\text{.}
\end{array}
$$
For a minimum, all of these partial derivatives must $=0\;$ . So, setting each to $0$ and solving, we get
$$ a_k\, \sigma^2_k = a_n\, \sigma^2_n $$
for each $k =1\, , \, 2 , \, \cdots \, n\;$ . That is, each of these takes on the same value (call it $c\,$ ), and thus for this constant $c\,$ , we have
$$ a_k =\frac{c}{\sigma^2_k} \;\text{.} $$

The value of $c$ can be obtained from
$$ \sum_{k=1}^n\, a_k =1 \,\text{,}$$
but we will not concern ourselves with such. It depends on the particular $\sigma^2_k$ values. And it is easy to check that these values for the $a_k$ do indeed produce a minimum for $\sigma^2_W\;$ . What we wish to notice, in particular, is that our intuition is supported: weights should be chosen proportional to the inverses of the respective variances. Thus. if one measuring device has a variance three times as large as another, it should be weighed only one-third as much as the more accurate device.

Expectation for Sums and Means

If we have a sum of random variables
$$ W =X_1 +X_2 +\cdot +X_n \,\text{,} $$
then our formula for expected value of a linear combination says that the mean of the sum is the sum of the individual means. Also, if the $X_k$ are independent, then the variance of the sum is equal to the sum of the variances. It is important to remember that this last – that the variance of the sum is the sum of the variances – is guaranteed only if the $X_k$ are independent.

We also see from our formulae that

$\displaystyle{ \mu_{cX} =c\mu_X }$ and $\displaystyle{ \sigma^2_{cX} =c^2\sigma^2_X }\;$ .

Thus multiplying a random variable by a constant $c$ yields a random variable whose mean is $c$ times the original mean, and whose variance is $c^2$ times the original variance.

Putting these two together, we obtain the mean and variance of the average of a random sample of size $n$ of a random variable $X\,$ , given that $\displaystyle{ \mu_X =\mu }$ & and $\displaystyle{ \sigma^2_X =\sigma^2 }\;$ . To this end,
$$ \bar{X} =\frac{1}{n} W =\frac{1}{n} \left( X_1 +X_2 +\cdots +X_n\right) \,\text{,} $$
where

$\displaystyle{ \mu_W =n \mu }$ and $\displaystyle{ \sigma^2_W =n \sigma^2 }\;$ .

Thus

$\displaystyle{ \mu_{\bar{X}} =\frac{1}{n} \mu_W =\mu }$ and $\displaystyle{ \sigma^2_{\bar{X}} =\frac{1}{n^2} \sigma_W^2 =\frac{\sigma^2}{n} }\;$ .

We see that the mean of an average is precisely the mean of the $X$ itself, while the variance of the average is smaller than the variance of $X$ by a factor of $\frac{1}{n}\;$ .

Some Comments about Moment Generating Functions

We will use moment generating functions, for example, to understand sample means and linear combinations of random variables – particularly when variables generating our samples are independent. Here is a formula that we will put to work in the case when we wish to determine information about the random variable
$$ g\left( X_1 , \cdots , X_n\right) = a_1\,X_1 +\cdots +a_n\,X_n $$
when $\displaystyle{ X_1 }\,$ , â€¦ , $\,\displaystyle{ X_n }$ are independent.

A Useful Formula: The Moment Generating Function for a Linear Combination of Independent Random Variables

Let $\displaystyle{ W =a_1\,X_1 +\cdots +a_n\,X_n }$ with $\displaystyle{ X_1 }\,$ , … , $\,\displaystyle{ X_n }$ independent. The moment generating function of $W$ is given by
$$
M_W(t) =E\left[ {\rm e}^{t\,\left( a_1\,X_1 +\cdots +a_n\,X_n \right)} \right]
=E\left[ {\rm e}^{t\,a_1\,X_1} \cdots {\rm e}^{t\,a_n\,X_n} \right] \;\text{.}
$$
The independence of the $\displaystyle{ X_k }\,$ , and hence of these exponential functions, implies that
$$
M_W(t) =E\left[ {\rm e}^{t\,a_1\,X_1} \right] \cdots E\left[ {\rm e}^{t\,a_n\,X_n} \right]
=M_{a_1\,X_1}(t) \cdots M_{a_n\,X_n}(t) \;\text{.}
$$
That is
$$
M_{a_1\,X_1 +\cdots +a_n\,X_n}(t) =M_{a_1\,X_1}(t) \cdots M_{a_n\,X_n}(t) \;\text{.}
$$