Joint Density Functions and Independent Random Variables

Suppose we have two discrete random variables, X   and   Y .   We denote by   dfX,Y   the joint density function
dfX,Y(a,b)=P(X=aY=b).
That is, dfX,Y(a,b)   records the probability that   X=a   and   Y=b   simultaneously.

If we have two continuous random variables, X   and   Y , this definition needs to be handled a bit more delicately.   We denote by   dfX,Y   the joint density function
dfX,Y(a,b)=limh0P(X[ah,a+h]Y[bh,b+h])4h2.
That is, dfX,Y(a,b)   records the probability that   X   and   Y   take on values in a small square centered at   (a,b) , scaled to recognize the size of the square.

Here are a couple of simple examples.

Discrete Example

Given six people, two of whom are left-handed and one of whom is ginger (and also happens to be a leftie), we pick one person at random, with each equally likely to be picked.   Letting   L   be the event that a leftie is picked, and   G   the event that the ginger is picked, determine the joint density function.

To be more precise, let   L   be the Bernoulli random variable assuming the value   0   if a right hander is picked, and   1   if a leftie is picked.   Then
mL(x)={46=23x=026=13x=1.
Similarly, we let   G be the Bernoulli random variable assuming the value   0   if the ginger is not picked, and   1   if the ginger picked.   We have
mG(y)={56y=016y=1.

The given information tells us that
mL,G(x,y)={46=23x=y=016x=1,y=00x=0,y=116x=1,y=1.

Continuous Example

We will assume that when throwing a dart at a target (dart board) on a wall, the probability that the dart hits in a certain region does not depend on the rotational angle about the center of the board.   The probability that the dart hits at a point a distance   r   from the center of the board is proportional to   1(1+r2)2.   Determine the joint density function.

To be more precise, put a coordinate system on the wall, with origin at the center of the dart board.     We will let   x   ( y ) be the left-right (up-down) position, with respect to the center of the dart-board, of the a point on the wall.   Then if   X   ( Y ) denotes the random variable describing the left-right (up-down) position of the hitting point of the dart   –   we will suppose that the probability density function   df(X,Y)   describing the hitting point of the dart is proportional to   1(1+x2+y2)2 .   We are to determine this proportionality.

Our compatibility condition is  
R2df(X,Y)=R2c(1+x2+y2)2dxdy=c02π0+r(1+r2)2drdθ=1.
But
0+r(1+r2)2dr=12,
so
c02π12dθ=1.
Thus   c=1π   and the probability density is given by
dfX,Y(x,y)=1π1(1+x2+y2)2.

In order to examine the underlying probabilistic phenomena driving the creation of our data (if indeed there is such), we will usually be taking many measurements, making many observations, or performing repeated experiments.   Given this, we will need to understand how to discuss probability when there are multiple samples.

We will start with a somewhat general discussion and then specialize it to the case mentioned in the previous section, where we are independently sampling a probability distribution   n   times.

Suppose we have a process that generates   n   values.   If the process is discrete, we can ask for the probability that a specific collection of values   x1 , x2 , … , xn   occurs.   Thus, if we flip a fair coin and roll a fair die with two sides numbered “1” and four sides numbered “2”, the collection of possible outcomes (with probabilities indicated) is
(H,1),P=16(H,2),P=13(T,1),P=16(T,2),P=13.
This probability is then a function   df   of   x1 , x2 , … , xn   If the process is continuous, the description is a bit more subtle, and we specify a density function which is integrated to obtain the probability that a collection of values will occur in a specific set.   Thus we will have a function   df(x1,x2,,xn)   such that the probability that an outcome satisfies   a1x1b1 , … , anxnbn   is given by
anbna1b1df(x1,x2,,xn).
This is called a multivariate density function, as it depends on several variables.

Such a density function   is determined by the properties that

  1. p(x1,x2,,xn)0 , and
  2. {(x1,x2,,xn) occursp(x1,x2,,xn)=1discrete caseanbna1b1p(x1,x2,,xn)dx1dxn=1continuous case.

We compute the expected value of function   g(x1,x2,,xn)   by
E[g]=g(x1,x2,,xn)p(x1,x2,,xn)dx1dxn.
and the corresponding moment generating function for   g   by
Mg(t)=E[etg]=etg(x1,x2,,xn)p(x1,x2,,xn)dx1dxn.
As properties of moments will be used in the future of this treatment, several useful properties
of the   E   operator will be discussed in the next section.

Independent Random Variables

We will say that discrete random variables   X   and   Y   are independent if, for any   x0 , y0R   we have
P(X=x0Y=y0)=P(X=x0)P(Y=y0).
Or, more generally so as to include continuous random variables, we add any   x1 , y1R   satisfying   x1>x0   and y1>y0 , and demand that
P(X[x0,x1]Y[y0,y1])=P(X[x0,x1])P(Y[y0,y1]).

Example

Consider the experiment of flipping a coin and then rolling a die.   Let   X   denote the number of heads that occur ( 0   or   1 ), and   Y   denote the number from the die roll ( 1   through   6 ).   Here is the table of outcomes with their probabilities.
123456X=0112112112112112112X=1112112112112112112
It is easy to see that   P(X=1)=12   and   P(Y=2)=16 .   Thus
P(X=1Y=2)=112=1216=P(X=1)P(Y=2).
This computation holds for any values which   X   or   Y   might assume.

One way to think about this is that the die is completely uninfluenced by the result of the flip.

Usually the determination of the density function of a multivariate process is rather onerous.   But this can be greatly simplified if the variables   x1,x2,,xn   are independent.   In this case, we have
p(x1,x2,,xn)=p1(x1)p2(x2)pn(xn).
That is, the density function can be written as a product of the density functions of the individual variables.

Computation of the expected value of special functions of the form   g(x1,x2,,xn)=g1(x1)g2(x2)gn(xn)   is simplified by
E[g1gn]=g1(x1)gn(xn)p1(x1)pn(xn)dx1dxn=g1(x1)p1(x1)dx1gn(xn)pn(xn)dxn=E[g1]E[gn].
In this case, we have that the moment-generating function for   g(x1,x2,,xn)=g1(x1)++gn(xn)   simplifies to
Mg1++gn(t)=E[et(g1++gn)]=etg1(x1)p1(x1)dx1etgn(xn)pn(xn)dxn=Mg1(t)Mgn(t).

The Case of Random Sampling

As was mentioned in the last section, we will think of random sampling as repeated independent values from a given probability function.   In this case, the above discussion of independence tells us that
p(x1,x2,,xn)=p1(x1)p2(x2)pn(xn)=p1(x1)p1(x2)p1(xn).
That is, the individual density functions are identical.