Joint Density Functions and Independent Random Variables – Introduction to Statistics via Spreadsheets

Suppose we have two discrete random variables, $\,X$ and $Y\;$ . We denote by $\displaystyle{ {\rm d}f_{X,Y} }$ the joint density function
$$ {\rm d}f_{X,Y}(a,b) =P( X=a \cap Y=b ) \;\text{.} $$
That is, $\,\displaystyle{ {\rm d}f_{X,Y}(a,b) }$ records the probability that $X=a$ and $Y=b$ simultaneously.

If we have two continuous random variables, $\,X$ and $Y\,$ , this definition needs to be handled a bit more delicately. We denote by $\displaystyle{ {\rm d}f_{X,Y} }$ the joint density function
$$ {\rm d}f_{X,Y}(a,b) =\lim_{h\to 0}\, \frac{P( X\in [a-h, a+h] \cap Y\in [b-h, b+h] )}{4h^2} \;\text{.} $$
That is, $\,\displaystyle{ {\rm d}f_{X,Y}(a,b) }$ records the probability that $X$ and $Y$ take on values in a small square centered at $(a,b)\,$ , scaled to recognize the size of the square.

Here are a couple of simple examples.

Discrete Example

Given six people, two of whom are left-handed and one of whom is ginger (and also happens to be a leftie), we pick one person at random, with each equally likely to be picked. Letting $L$ be the event that a leftie is picked, and $G$ the event that the ginger is picked, determine the joint density function.

To be more precise, let $L$ be the Bernoulli random variable assuming the value $0$ if a right hander is picked, and $1$ if a leftie is picked. Then
$$ m_L(x) =\left\{
\begin{array}{cc} \frac{4}{6} =\frac{2}{3} & x=0 \\ & \\ \frac{2}{6} =\frac{1}{3} & x=1 \end{array} \right. \;\text{.}
$$
Similarly, we let $G$ be the Bernoulli random variable assuming the value $0$ if the ginger is not picked, and $1$ if the ginger picked. We have
$$ m_G(y) =\left\{ \begin{array}{cc} \frac{5}{6} & y=0 \\ & \\ \frac{1}{6} & y=1 \end{array} \right. \;\text{.} $$

The given information tells us that
$$
m_{L,G}(x,y) =\left\{ \begin{array}{cc}
\frac{4}{6} =\frac{2}{3} & x=y=0 \\ & \\ \frac{1}{6} & x=1, y=0 \\ & \\ 0 & x=0, y=1 \\ & \\ \frac{1}{6} & x=1, y=1
\end{array} \right. \;\text{.}
$$

Continuous Example

We will assume that when throwing a dart at a target (dart board) on a wall, the probability that the dart hits in a certain region does not depend on the rotational angle about the center of the board. The probability that the dart hits at a point a distance $r$ from the center of the board is proportional to $\displaystyle{ \frac{1}{(1+r^2)^2} }\; $. Determine the joint density function.

To be more precise, put a coordinate system on the wall, with origin at the center of the dart board. We will let $x$ ( $\, y\, $ ) be the left-right (up-down) position, with respect to the center of the dart-board, of the a point on the wall. Then if $X$ ( $\, Y\, $ ) denotes the random variable describing the left-right (up-down) position of the hitting point of the dart â€“ we will suppose that the probability density function $\displaystyle{ {\rm d}f_{(X,Y)} }$ describing the hitting point of the dart is proportional to $\displaystyle{ \frac{1}{(1+x^2+y^2)^2} }\; $ . We are to determine this proportionality.

Our compatibility condition is
$$
\int\int_{\mathbb{R}^2}\, {\rm d}f_{(X,Y)}
=\int\int_{\mathbb{R}^2}\, \frac{c}{\left(1+x^2+y^2\right)^2} \, {\rm d}x\, {\rm d}y
=c\,\int_0^{2\pi}\int_0^{+\infty}\,\frac{r}{\left(1+r^2\right)^2} \, {\rm d}r\, {\rm d}\theta =1 \;\text{.}
$$
But
$$
\int_0^{+\infty}\,\frac{r}{\left(1+r^2\right)^2} \, {\rm d}r =\frac{1}{2} \,\text{,}
$$
so
$$
c\,\int_0^{2\pi}\, \frac{1}{2} \, {\rm d}\theta =1 \;\text{.}
$$
Thus $c=\frac{1}{\pi}$ and the probability density is given by
$$
{\rm d}f_{X,Y}(x,y) =\frac{1}{\pi}\,\frac{1}{(1+x^2+y^2)^2} \;\text{.}
$$

In order to examine the underlying probabilistic phenomena driving the creation of our data (if indeed there is such), we will usually be taking many measurements, making many observations, or performing repeated experiments. Given this, we will need to understand how to discuss probability when there are multiple samples.

We will start with a somewhat general discussion and then specialize it to the case mentioned in the previous section, where we are independently sampling a probability distribution $n$ times.

Suppose we have a process that generates $n$ values. If the process is discrete, we can ask for the probability that a specific collection of values $x_1\,$ , $\, x_2\,$ , … , $x_n$ occurs. Thus, if we flip a fair coin and roll a fair die with two sides numbered “1” and four sides numbered “2”, the collection of possible outcomes (with probabilities indicated) is
$$
\begin{array}{c|c}
(H,1) \, , \, P=\frac{1}{6} & (H,2) \, , \, P=\frac{1}{3} \\ \hline (T,1) \, , \, P=\frac{1}{6} & (T,2) \, , \, P=\frac{1}{3}
\end{array} \;\text{.}
$$
This probability is then a function ${\rm d}f$ of $x_1\,$ , $\, x_2\,$ , … , $x_n$ If the process is continuous, the description is a bit more subtle, and we specify a density function which is integrated to obtain the probability that a collection of values will occur in a specific set. Thus we will have a function $\displaystyle{ {\rm d}f\left( x_1, x_2 , \cdots , x_n \right) }$ such that the probability that an outcome satisfies $\displaystyle{ a_1 \le x_1 \le b_1 }\,$ , … , $\displaystyle{ a_n \le x_n \le b_n }$ is given by
$$
\int_{a_n}^{b_n} \, \cdots \, \int_{a_1}^{b_1} \, {\rm d}f\left( x_1, x_2 , \cdots , x_n \right) \;\text{.}
$$
This is called a multivariate density function, as it depends on several variables.

Such a density function is determined by the properties that

$\displaystyle{ p\left( x_1, x_2 , \cdots , x_n \right) \ge 0 }\, $ , and
$$
\left\{ \begin{array}{rlcc}
\sum_{\left( x_1, x_2 , \cdots , x_n \right) \text{ occurs}}\, p\left( x_1, x_2 , \cdots , x_n \right) & =1 & \qquad & \text{discrete case} \\
\int_{a_n}^{b_n} \, \cdots \, \int_{a_1}^{b_1} \, p\left( x_1, x_2 , \cdots , x_n \right) \,
{\rm d}x_1 \cdots {\rm d}x_n & =1 & \qquad & \text{continuous case}
\end{array} \right. \;\text{.}
$$

We compute the expected value of function $\displaystyle{ g\left( x_1, x_2 , \cdots , x_n \right) }$ by
$$
E[g] =\int_{-\infty}^{\infty} \, \cdots \, \int_{-\infty}^{\infty} \,
g\left( x_1, x_2 , \cdots , x_n \right)\cdot p\left( x_1, x_2 , \cdots , x_n \right) \,
{\rm d}x_1 \cdots {\rm d}x_n \;\text{.}
$$
and the corresponding moment generating function for $g$ by
$$
M_g(t) =E\left[ {\rm e}^{t\, g} \right] =\int_{-\infty}^{\infty} \, \cdots \, \int_{-\infty}^{\infty} \, {\rm e}^{t\, g\left( x_1, x_2 , \cdots , x_n \right)}\cdot p\left( x_1, x_2 , \cdots , x_n \right) \, {\rm d}x_1 \cdots {\rm d}x_n \;\text{.}
$$
As properties of moments will be used in the future of this treatment, several useful properties
of the $E$ operator will be discussed in the next section.

Independent Random Variables

We will say that discrete random variables $X$ and $Y$ are independent if, for any $x_0\,$ , $\, y_0\in\mathbb{R}$ we have
$$ P(X=x_0 \cap Y=y_0) =P(X=x_0) \cdot P(Y=y_0) \;\text{.} $$
Or, more generally so as to include continuous random variables, we add any $x_1\,$ , $\, y_1\in\mathbb{R}$ satisfying $x_1 \gt x_0$ and $y_1 \gt y_0\,$ , and demand that
$$
P\left( X\in \left[x_0 , x_1 \right] \cap Y\in \left[ y_0 , y_1 \right] \right)
=P\left( X\in \left[x_0 , x_1 \right] \right) \cdot P\left( Y\in \left[ y_0 , y_1 \right] \right) \;\text{.}
$$

Example

Consider the experiment of flipping a coin and then rolling a die. Let $X$ denote the number of heads that occur ( $\, 0$ or $1\, $ ), and $Y$ denote the number from the die roll ( $\, 1$ through $6\, $ ). Here is the table of outcomes with their probabilities.
$$
\begin{array}{c|cccccc} & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline X=0 & \textstyle{ \frac{1}{12} } & \textstyle{ \frac{1}{12} }
& \textstyle{ \frac{1}{12} } & \textstyle{ \frac{1}{12} } & \textstyle{ \frac{1}{12} } & \textstyle{ \frac{1}{12} } \\
X=1 & \textstyle{ \frac{1}{12} } & \textstyle{ \frac{1}{12} } & \textstyle{ \frac{1}{12} } & \textstyle{ \frac{1}{12} }
& \textstyle{ \frac{1}{12} } & \textstyle{ \frac{1}{12} }
\end{array}
$$
It is easy to see that $P(X=1) =\frac{1}{2}$ and $P(Y=2) =\frac{1}{6}\; $ . Thus
$$ P(X=1 \cap Y=2) =\frac{1}{12} =\frac{1}{2}\cdot \frac{1}{6} =P(X=1)\cdot P(Y=2)\;\text{.} $$
This computation holds for any values which $X$ or $Y$ might assume.

One way to think about this is that the die is completely uninfluenced by the result of the flip.

Usually the determination of the density function of a multivariate process is rather onerous. But this can be greatly simplified if the variables $\displaystyle{ x_1, x_2 , \cdots , x_n }$ are independent. In this case, we have
$$
p\left( x_1, x_2 , \cdots , x_n \right) =p_1\left( x_1 \right)\cdot p_2\left( x_2 \right)\cdot p_n \left( x_n \right)
\;\text{.}
$$
That is, the density function can be written as a product of the density functions of the individual variables.

Computation of the expected value of special functions of the form $\displaystyle{ g\left( x_1, x_2 , \cdots , x_n \right) =g_1\left( x_1\right)\cdot g_2\left( x_2\right)\cdots g_n\left( x_n\right) }$ is simplified by
$$
\begin{array}{rl}
E\left[g_1 \cdots g_n\right] & =\int_{-\infty}^{\infty} \, \cdots \, \int_{-\infty}^{\infty} \,
g_1\left( x_1\right)\cdots g_n\left( x_n\right)\cdot p_1\left( x_1 \right)\cdots p_n \left( x_n \right) \,
{\rm d}x_1 \cdots {\rm d}x_n \\ & \\ & =
\int_{-\infty}^{\infty} \, g_1\left( x_1\right)\cdot p_1\left( x_1 \right) \, {\rm d}x_1 \cdot \cdots \cdot
\int_{-\infty}^{\infty} \, g_n\left( x_n\right)\cdot p_n\left( x_n \right) \, {\rm d}x_n \\ & \\ & =
E\left[ g_1 \right]\cdots E\left[ g_n\right] \;\text{.}
\end{array}
$$
In this case, we have that the moment-generating function for $\displaystyle{ g\left( x_1, x_2 , \cdots , x_n \right) =g_1\left( x_1\right) +\cdots + g_n\left( x_n\right) }$ simplifies to
$$
M_{g_1 +\cdots +g_n}(t) =E\left[ {\rm e}^{t\, \left(g_1 +\cdots +g_n\right)} \right] =
\int_{-\infty}^{\infty} \, {\rm e}^{t\, g_1\left( x_1\right)}\cdot p_1\left( x_1 \right) \, {\rm d}x_1 \cdot \cdots \cdot
\int_{-\infty}^{\infty} \, {\rm e}^{t\, g_n\left( x_n\right)}\cdot p_n\left( x_n \right) \, {\rm d}x_n =
M_{g_1}(t)\cdots M_{g_n}(t) \;\text{.}
$$

The Case of Random Sampling

As was mentioned in the last section, we will think of random sampling as repeated independent values from a given probability function. In this case, the above discussion of independence tells us that
$$
p\left( x_1, x_2 , \cdots , x_n \right) =p_1\left( x_1 \right)\cdot p_2\left( x_2 \right)\cdot p_n \left( x_n \right)
=p_1\left( x_1 \right)\cdot p_1\left( x_2 \right)\cdot p_1 \left( x_n \right) \;\text{.}
$$
That is, the individual density functions are identical.