Moments and the Moment Generating Function

In practice, the computation of expected values for certain particular functions of random variables have arisen repeatedly.   These are the power functions.   Thus, for a random variable   X   with probability density   p , the expected values of   x , x2 , x3 , … , have routinely proven useful.   These are given special names.

Definition: The   kth   moment about the origin of random variable   X   with density   p   is given by
μk=E[Xk]={i=1xikp(xi)Xdiscretexkp(x)dxXcontinuous
It is common to just refer to the density function when performing such computations, rather than to a random variable having the given density function.   Thus we might refer to the   kth   moment of   p .

The first moment   μ1   (this is   E[X] , the mean of   X ) arises so frequently that it is given a special symbol: μ .   Also, the moments about the mean are usually what are really of issue (and why the wording “about the origin” was used above), and we define these by

Definition: The   kth   moment about the mean of random variable   X   with density   p   is given by
μk=E[(Xμ)k]={i=1(xiμ)kp(xi)Xdiscrete(xμ)kp(x)dxXcontinuous

The geometric meanings of the   μk   are the same as was discussed in our description
of data.   (See chapter 1, section 3 and chapter 1, section 4 in particular.)   They describe aspects of the
way probability is spread about the mean of a density function.   The most important of these is   μ2 , which also occurs often enough and in an important enough role that it is given a special name and symbol.   We call it the variance of the density, and denote it by   σ2 .   Its positive square root, σ , is called the standard deviation of the density function.   We use it in place of the variance when we want our measure of concentration of probability about the mean to be in the same units as our random variable.

A trick for computing   μ2

The following computational trick is often useful (and will be for us) for computing   μ2 , and in any case gives a simple relationship between   μ2 , μ2 , and   μ .
μ2=i=1(xiμ)2p(xi)=i=1(xi22μxi+μ2)p(xi)=i=1xi2p(xi)+2μi=1xip(xi)+μ2i=1p(xi)=μ22μμ+μ2=μ2μ2
Similar such tricks exist for the higher moments, but will not be worked out here.   The reader is invited to find a relationship between   μ3 , μ3 , μ2 , and   μ .

By way of illustration   –   and also a display of usefulness   –   consider finding the mean and variance of a linear combination of independent random variables.

Mean and Variance of a Linear Combination of Independent Random Variables

Let   X1 , … , Xn   be   n   mutually independent random variables, with means   μ1 , … , μn   and variances   σ1 , … ,
σn .   Let
W=a1X1++anXn.
From linearity we have
μW=E[W]=k=1nE[akXk]=k=1nakμk.
Now use
WμW=a1(X1μ1)++an(Xnμn)
to get
(WμW)2=j=1nk=1najak(Xjμj)(Xkμk).
Applying our properties above, we have
E[(WμW)2]=j=1nk=1najakE[(Xjμj)(Xkμk)].
Since the   Xj   and   Xk   are independent if   jk , we have
E[(Xjμj)(Xkμk)]=E[Xjμj]E[Xkμk](jk).
But   E[Xjμj]=0 , so
E[(WμW)2]=k=1nak2E[(Xkμk)2].
Since   E[(Xkμk)2]=σk2 , we have
σW2=k=1nak2σk2.

While direct computation of moments of a probability distribution may be rather simple, we present here another method which has enough other uses to justify its existence and our familiarity.

Consider
MX(t)=E[etX]={i=1etxif(xi)Xdiscreteetxf(x)dxXcontinuous

To see how this computation yields the moments of distribution   X , we need to know that as a power series
ez=1+z+z22!+z33!+
So that
MX(t)=i=1(1+(txi)+(txi)22!+(txi)33!+)f(xi)=i=1f(xi)+ti=1xif(xi)+t22!i=1xi2f(xi)+=1+tμ1+t22!μ2+t33!μ3+.
We can now obtain the moments about the mean by
μk=dkxdxkMX(t)|t=0.
The utility of this will show up soon.

For now we call   MX(t)   the moment generating function of   X , and   Mg(X)(t)   the moment generating function of   g(X) .   We will usually refer to this as MGF for short.

Two Properties of MGFs   –   How to Handle Addition of, and Multiplication by, a Constant

Theorem:   For any constant   c   and any function   g(X)   for which the MGF   Mg(X)   exists, we have
Mcg(t)=Mg(ct)Mg+c(t)=ectMg(t).

How to see this

Let   c   be a constant, and   g(X)   be a function for which   Mg(X)   exists.   Then
Mcg(t)=etcg(x)f(x)dx=Mg(ct).
Secondly
Mg+c(t)=et(g(x)+c)f(x)dx=ectetg(x)f(x)dx=ectMg(t).

These two properties will be useful in later computations.   In particular, if a constant multiplies, or is added to, a function of a random variable, these formulae will allow us to easily compute moments and MGFs based on the MGF of our function.

A simple fact is the following:

Theorem If two random variables   X   and   Y   have same distribution, then they have the same moment generating function, i.e.   MX(t)=MY(t)   for all   tR .

The above theorem is easy to prove, but an important fact about the moment generating function is that the above result also goes other way round.   And this is what makes the moment generating function such a powerful tool.   We will not give a proof here, but will feel free to use this fact.

Theorem If two random variables   X   and   Y   have same moment generating function, i.e.   MX(t)=MY(t)   for all   tR , then they have the same distribution function, i.e. FX(t)=FY(t)   for all   tR .

We will use moment generating functions, for example, to understand sample means and linear combinations of random variables   &ndash:   particularly when variables generating our samples are independent.   Here is a formula that we will put to work in the case when we wish to determine information about the random variable
g(X1,,Xn)=a1X1++anXn
when   X1 , … , Xn   are independent.

A Useful Formula: The Moment Generating Function for a Linear Combination of Independent Random Variables

Let   W=a1X1++anXn   with   X1 , … , Xn   independent.   The moment generating function of   W   is given by
MW(t)=E[et(a1X1++anXn)]=E[eta1X1etanXn].
The independence of the   Xk , and hence of these exponential functions, implies that
MW(t)=E[eta1X1]E[etanXn]=Ma1X1(t)ManXn(t).
That is
Ma1X1++anXn(t)=Ma1X1(t)ManXn(t).