Basic Models: Discrete Random Variables

In this section and the next, we will introduce some basic probability density functions which are used to describe common phenomena.   These will be presented in two different groups: discrete density functions, generally used in situations where we can count off the possible outcomes of an experiment, will be discussed in this section.   Continuous distributions, generally used in situations where there is a continuum of possible outcomes   –   a collection of possible outcomes requiring at least an interval of   $\mathbb{R}$   to parameterize, will be described in the next section.

Probabilistic processes with discrete density functions have the feature that the probability of an event can be determined by just adding up the probabilities for the outcomes that comprise the event.

There are several discrete distributions which occur routinely, with which anyone wishing to use mathematical statistics should be familiar.

Uniform Densities

When there are a finite number of possible outcomes to an experiment, and each outcome is equally likely, the probability distribution is said to be uniform.   Thus, for example, on rolling a standard die the probabilities of each outcome   $1\,$ , $\,2\,$ , $\,\dots\,$ , $\,6$   is   $\displaystyle{ \frac{1}{6} }\;$ .

Similarly, consider a standard well-shuffled deck (fifty-two cards, $\,2\diamondsuit\,$ $\,\dots\,$ , $\,A\spadesuit\,$ ).   We will assume the conditions ‘standard well-shuffled’ henceforth when discussing cards unless otherwise specified.   On drawing a card from such a deck the probability of drawing any particular card is   $\displaystyle{ \frac{1}{52} }\;$ .

These are uniform distributions, where in each case we have a finite number (call it   $n\,$ ) of possible outcomes, each with probability   $\displaystyle{ \frac{1}{n} }$   of occurring.

In the special case where there are   $n+1$   values spaced by step-size   $1\,$ , i.e.   $\,a\,$ , $\,a+1\,$ , $\,\dots\,$ , $\,a+n\,$ , the mean and standard deviation of such a density are easily calculated.   The mean is
$$ \mu =\frac{a +(a+1) +\cdots +(a+n)}{n+1} =a+\frac{n}{2} $$
(i.e. the average of the minimum and maximum data values).   To compute the variance we first find that
$$ \mu’_2 =\frac{a^2 +(a+1)^2 +\cdots +(a+n)^2}{n+1} =a^2 +n\, a +\frac{n\, (2\,n+1)}{6} \,\text{,} $$
so that the variance is
$$
\sigma^2 =\mu_2 =\mu’_2 -\mu^2 =a^2 +n\, a +\frac{n\, (2\,n+1)}{6} -\left(a +\frac{n}{2} \right)^2
=\frac{n\, (n+2)}{12} \;\text{.}
$$

Example: A fair twenty-sided die has faces numbered from   $1$   through   $20\;$ .   What is the probability that a roll will yield a value greater than   $15\;$ ?

The die being fair means that each of the possible values is equally likely.   If   $X$   denotes the experiment of rolling the die, the probability that a roll gives one of the five values   $16$   through   $20$   is
$$
P(X\gt 15) =\frac{5}{20} =\frac{1}{4} \;\text{.}
$$

Bernoulli Densities

Any yes-no experiment, with two mutually exclusive possible outcomes, is called a Bernoulli experiment.   Assigning the values   $0$   and
  $1$   to the two outcomes of such an experiment, the probability of obtaining   $1$   will be denoted by   $p\,$ , so that the probability of obtaining   $0$   is then   $1-p\;$ .   If we denote our random variable by   $X\,$ , we thus have
$$
X=\left\{
\begin{array}{cc} 0 & \text{with probability } 1-p \\ & \\ 1 & \text{with probability } p \end{array}
\right. \;\text{.}
$$

The mean of such a random variable   $X$   is
$$ \mu =E[X] =0\cdot (1-p) +1\cdot p =p \,\text{,} $$
and the variance is
$$
\mu_2 =\mu’_2 -\mu^2 =E\left[ X^2\right] -\mu^2 =0^2 \cdot (1-p) +1^2 \cdot p -p^2
= p-p^2 =p\cdot (1-p) \;\text{.}
$$

The moment-generating function for bernoulli densities can be explicitly computed as follows.

Moment Generating Function

If   $X$   is a Bernoulli random variable with density   ${\rm Bernoulli}(p)\,$ , we compute
$$ M_X(t) ={\rm e}^{t\cdot 0}\, (1-p) +{\rm e}^{t\cdot 1}\, p =\left( 1-p + p\, {\rm e}^t \right) \;\text{.} $$

An example of such a Bernoulli experiment is a coin-flip.   Let us call this experiment   $X\;$ .   Assigning the value   $1$   to heads and   $0$   to tails, if the coin is fair then   $\displaystyle{ P(X=1) =\frac{1}{2} =P(X=0) }\;$ .

An example where the two probabilities are not the same is the roll of a standard die, where success is getting the value   $4\;$ .   This happens with probability   $\displaystyle{ \frac{1}{6} }\;$ .   If we denote this experiment by   $X$ and assign the value   $1$   to the die showing a   $4\,$ , $\,0$   if the die shows anything but   $4\,$ , we have
$$
X=\left\{ \begin{array}{cc}
0 & \text{with probability } \frac{5}{6} \\ & \\ 1 & \text{with probability } \frac{1}{6}
\end{array} \right. \;\text{.}
$$

Any lottery is at its core a Bernoulli experiment, where one either wins or loses.   If we think of left-right handedness as a genetic lottery denoted by   $X\,$ , with left-handedness being assigned the value   $1$   and right-handedness   $0\,$ , then as roughly   $10\%$   of people are lefties we approximate this lottery by
$$
X=\left\{ \begin{array}{cc}
0 & \text{with probability } .9 \\ & \\ 1 & \text{with probability } .1
\end{array} \right. \;\text{.}
$$

Examples of Bernoulli Densities

  • As of 2020, $\, 33\%$   of registered voters identified as Democrat, while   $29\%$   identified as Republican.   Thus   $62\%$   of registered voters identified as either Democrat or Republican.
  • Approximately   $10\%$   of humans are left-handed.   Approximately   $2\%$   of Americans identify as homosexual. Approximately   $2\%$   of humans are red-heads, but in Scotland the incidence of red hair is roughly   $13\%\;$ .

Binomial Densities

Consider the new experiment of performing a Bernoulli experiment a bunch of times and counting the number of ‘successes’ (the number of times   $1$   occurs).

For example, consider the experiment of flipping a coin twice and letting   $X$   denote the number of times heads occurs.   There are four possible outcomes for flipping a coin twice, one of which had heads occurring not at all, two of which have heads occurring once, and one having heads occurring twice.   These are illustrated below.
$$
\begin{array}{c}
\begin{array}{c|cccccc}
_{\#2}\big\backslash^{\#1} & H & T \\ \hline H & 2 & 1 \\
T & 1 & 0 \\
\end{array}
\\
\small{\text{number of heads}}
\end{array}
$$

If we know that each of the four possible outcomes on flipping a coin twice is equally likely, the probability distribution for this experiment is
$$
X=\left\{ \begin{array}{cc}
0 & \text{with probability } \frac{1}{4} \\ & \\ 1 & \text{with probability } \frac{2}{4} =\frac{1}{2} \\
& \\ 2 & \text{with probability } \frac{1}{4}
\end{array} \right. \;\text{.}
$$

To be precise, if we know that the second flip is independent of the first then, letting   $\displaystyle{ X_1 }$   represent the first flip and   $\displaystyle{ X_2 }$   the second, we have
$$ P\left( X_2 =1 \big| X_1 =1 \right) =P\left( X_2 =1 \right) =\frac{1}{2} \;\text{.} $$
Thus
$$
P\left( X_1 =1 \cap X_2 =1 \right) =P\left( X_1 =1 \right)\cdot P\left( X_2 =1 \right)
=\frac{1}{2}\cdot \frac{1}{2} =\frac{1}{4} \;\text{.}
$$

This technique can be used to show that
$$
\begin{array}{rl}
P\left( X_1 =1 \cap X_2 =0 \right) & =\frac{1}{4} \\ & \\
P\left( X_1 =0 \cap X_2 =1 \right) & =\frac{1}{4} \\ & \\
P\left( X_1 =0 \cap X_2 =0 \right) & =\frac{1}{4}
\end{array} \;\text{.}
$$

To extract the essential features of the above experiment and put them to work, we ask that the Bernoulli trials are mutually independent, and each has the same probability of yielding a   $1\;$ .   The official words for this are “mutually independent and identically distributed”.   On running   $n$   mutually independent and identically distributed Bernoulli experiments and counting the number of times   $1$   occurs, the total is an experiment where the outcome can be   $0\,$ , $\,1\,$ , $\,\dots\,$ , $\,n\;$ .   Such an experiment is called
Binomial.   We seek the probability distribution for this experiment.

To this end, consider the following special case.

Special Case:   Determination of the probability density for the sum of three Bernoulli trials

We now let   $X$   denote a Bernoulli experiment with   $P(X=1) =p\,$ , and   $\displaystyle{X_1}\,$ , $\,\displaystyle{X_2}\,$ , and   $\displaystyle{X_3}\,$   denote three independent trials of   $X\;$ .   Suppose we ask for the probability   $\displaystyle{ P\left( X_1 =1 \cap X_2 =1 \cap X_3 =0 \right) }\;$ .   We use mutual independence to obtain
$$
\begin{array}{rl}
P\left( X_1 =1 \cap X_2 =1 \cap X_3 =0 \right) &
=P\left( \left( X_2 =1 \cap X_3 =0 \right) \big| X_1 =1 \right)\cdot P\left( X_1 =1 \right) \\ & \\
& =P\left( X_2 =1 \cap X_3 =0 \right)\cdot P\left( X_1 =1 \right) \\ & \\
& =P\left( X_3 =0 \big| X_2 =1 \right)\cdot P\left( X_2 =1 \right)\cdot P\left( X_1 =1 \right) \\ & \\
& =P\left( X_3 =0 \right)\cdot P\left( X_2 =1 \right)\cdot P\left( X_1 =1 \right) \\ & \\
& =(1-p)\cdot p\cdot p \\ & \\
& =(1-p)\cdot p^2 \;\text{.}
\end{array}
$$

More generally, if we ask for the probability that   $\displaystyle{ X_1 +X_2 +X_3 =2 }$   then there are three possible ways this can occur, and the probability for each is computed as above to be   $\displaystyle{ (1-p)\, p^2 }\;$ .   These outcomes are non-intersecting, so the probability that   $\displaystyle{ X_1 +X_2 +X_3 =2 }$   is precisely the sum of the three probabilities   $\displaystyle{ P\left( X_1 =1 \cap X_2 =1 \cap X_3 =0 \right) }\,$ , $\,\displaystyle{ P\left( X_1 =1 \cap X_2 =0 \cap X_3 =1 \right) }\,$ , and   $\displaystyle{ P\left( X_1 =0 \cap X_2 =1 \cap X_3 =1 \right) }\;$ .   This sum is   $\displaystyle{ (1-p)\, p^2 +(1-p)\, p^2 +(1-p)\, p^2 =3\, (1-p)\, p^2 }\;$ .

We use this to find the full probability density for   $\displaystyle{ Y =X_1 +X_2 +X_3 }$ .   To this end, $Y=0$   can occur in only one way ( $\,\displaystyle{ X_1 =X_2 =X_3 =0 }\,$ ) with probability   $\displaystyle{ P(Y=0) =(1-p)^3 }\;$ .   Event   $Y=1$   can occur in three ways, each with probability   $\displaystyle{ (1-p)^2\, p }\,$ , so that   $\displaystyle{ P(Y=1) =3\, (1-p)^2\, p }\;$ .   Event   $Y=2$   can occur in three ways as previously observed, each with probability   $\displaystyle{ (1-p)\, p^2 }\,$ , so that   $\displaystyle{ P(Y=2) =3\, (1-p)\, p^2 }\;$ .   And finally   $Y=3$   can occur in only one way ( $\,\displaystyle{ X_1 =X_2 =X_3 =1 }\,$ ) with probability   $\displaystyle{ P(Y=3) =p^3 }\;$ .

This is summarized (for   $k =0, 1, 2, 3\,$ ) as
$$
P(Y=k) \, =\, \left\{ \begin{array}{cc} (1-p)^3 & k=0 \\ 3\, (1-p)^2\, p & k=1 \\
3\, (1-p)\, p^2 & k=2 \\ p^3 & k=3 \end{array} \right. \,
=\,\left( \begin{array}{c} 3 \\ k \end{array}\right)\, (1-p)^{3-k}\, p^k \;\text{.}
$$

We can extract the salient features of the above example to determine the probability density function for an experiment   $Y$   counting the number of occurrences of   $1$   in   $n$   trials of a Bernoulli experiment   $X$   with probability   $p$   of   $1\;$ .   We note that   $Y$   is exactly   $\displaystyle{ X_1 +X_2 +\cdot +X_n }\;$ .

There are   $\displaystyle{ \left( \begin{array}{c} n \\ k \end{array}\right) }$   ways for   $Y$   to assume the value   $k$   ( $\,k =0, 1, \cdots, n\,$ ), each with probability   $\displaystyle{ (1-p)^{n-k}\, p^k }\;$ .   Thus the probability density function for   $Y$   is
$$ P(Y=k) =\left( \begin{array}{c} n \\ k \end{array}\right)\, (1-p)^{n-k}\, p^k \;\text{.} $$
These values are the terms in the binomial expansion of
$$
1 =((1-p) +p)^n
=\sum_{k=0}^n\, \left( \begin{array}{c} n \\ k \end{array}\right)\, (1-p)^{n-k}\, p^k \,\text{,}
$$
and as such give the density function its name.   We denote this by   ${\rm Binom}(k;p,n)\,$ , so that
$$ {\rm Binom}(k;p,n) =\left( \begin{array}{c} n \\ k \end{array}\right)\, (1-p)^{n-k}\, p^k \;\text{.} $$

The mean and variance of this density are determined as follows.   The mean is
$$
\begin{array}{rl}
\mu & =0\cdot {\rm Binom}(0;p,n) +1\cdot {\rm Binom}(1;p,n) +\cdots +n\cdot {\rm Binom}(n;p,n) \\ & \\
& =0\cdot \left( \begin{array}{c} n \\ 0 \end{array}\right)\, (1-p)^n +
1\cdot \left( \begin{array}{c} n \\ 1 \end{array}\right)\, (1-p)^{n-1}\, p + \cdots
n\cdot \left( \begin{array}{c} n \\ n \end{array}\right)\, p^n \\ & \\
& =\sum_{k=0}^n\, k\cdot \left( \begin{array}{c} n \\ k \end{array}\right)\, (1-p)^{n-k}\, p^k \\ & \\
& =\sum_{k=1}^n\, \frac{k\, n!}{(n-k)!\, k!} (1-p)^{n-k}\, p^k \\ & \\
& =n\,p\, \sum_{k=0}^{n-1}\, \frac{(n-1)!}{((n-1)-(k-1))!\, (k-1)!} (1-p)^{((n-1)-(k-1))}\, p^{k-1} \\
& \\ & =n\, p
\end{array} \;\text{.}
$$
The variance is similarly computed using   $\displaystyle{ \sigma^2 =\mu_2 =\mu’_2 -\mu^2 }\,$ , after determining  
$$
\begin{array}{rl}
\mu’_2 & =0^2\cdot {\rm Binom}(0;p,n) +1^2\cdot {\rm Binom}(1;p,n) +\cdots +n^2\cdot {\rm Binom}(n;p,n) \\
& \\ & =0^2\cdot \left( \begin{array}{c} n \\ 0 \end{array}\right)\, (1-p)^n +
1^2\cdot \left( \begin{array}{c} n \\ 1 \end{array}\right)\, (1-p)^{n-1}\, p + \cdots +
n^2\cdot \left( \begin{array}{c} n \\ n \end{array}\right)\, p^n \\ & \\
& =\sum_{k=0}^n\, k^2\cdot \left( \begin{array}{c} n \\ k \end{array}\right)\, (1-p)^{n-k}\, p^k \\ & \\
& =\sum_{k=0}^n\, k\, (k-1)\cdot \left( \begin{array}{c} n \\ k \end{array}\right)\, (1-p)^{n-k}\, p^k
+\sum_{k=0}^n\, k\cdot \left( \begin{array}{c} n \\ k \end{array}\right)\, (1-p)^{n-k}\, p^k \\ & \\
& =\sum_{k=2}^n\, \frac{k\, (k-1)\, n!}{(n-k)!\, k!} (1-p)^{n-k}\, p^k +n\, p \\ & \\
& =n\,(n-1)\, p^2\,
\sum_{k=0}^{n-2}\, \frac{(n-2)!}{((n-2)-(k-2))!\, (k-2)!} (1-p)^{((n-2)-(k-2))}\, p^{k-2} +n\, p \\ & \\
& =n\, (n-1)\, p^2 +n\, p
\end{array} \;\text{.}
$$
Thus
$$ \sigma^2 =\mu_2 =n\, (n-1)\, p^2 +n\, p -( n\, p)^2 =n\, p\, (1-p) \;\text{.} $$

The moment-generating function for binomial densities can be explicitly computed, and used to verify the above determinations of   $\mu$   and   $\displaystyle{ \sigma^2 }\;$ .

Moment Generating Function

$$
\begin{array}{rl}
M(t) & =\sum_{k=0}^n\, {\rm e}^{t\, k}\, \frac{n!}{k!\, (n-k)!}\, p^k\, (1-p)^n-k \\ & \\
& =\sum_{k=0}^n\, \frac{n!}{k!\, (n-k)!}\, \left( p\, {\rm e}^t \right)^k\, (1-p)^n-k \\ & \\
& =\left( 1-p + p\, {\rm e}^t \right)^n
\end{array}
$$
Note that this is the   $\displaystyle{ n^{\rm th} }$   power of the moment generating function for the underlying Bernoulli density.   This is a simple illustration of the fact that the moment generating function for a sum of independent random variables is the product of their moment generating functions.

From this moment generating function we have
$$
\begin{array}{rl}
\frac{{\rm d}\phantom{t}}{{\rm d}t}\,M(t) & =n\, \left( 1-p + p\, {\rm e}^t \right)^{n-1}\, p\, {\rm e}^t \\ & \\
\frac{{\rm d}^2\phantom{t}}{{\rm d}t^2}\,M(t) &
=n\, \left( 1-p + p\, {\rm e}^t \right)^{n-2}\, p\, {\rm e}^t \, \left( 1-p + n\, p\, {\rm e}^t \right) \\
\end{array}
$$
so that   $\mu$   and   $\displaystyle{ \mu’_2 }$   are given by
$$
\begin{array}{rl}
\left. \frac{{\rm d}\phantom{t}}{{\rm d}t}\,M(t) \right|_{t=0} & =n\, \left( 1-p + p \right)^{n-1}\, p \\ & \\
=n\, p \\
\end{array}
$$
and
$$
\begin{array}{rl}
\left. \frac{{\rm d}^2\phantom{t}}{{\rm d}t^2}\,M(t) \right|_{t=0} & = n\, \left( 1-p + p \right)^{n-2}\, p \, \left( 1-p + n\, p \right)
\\ & \\ & = n\, p\, (1-p+n\, p) \\
\end{array} \,\text{,}
$$
respectively.   These are immediately seen to agree with the computations above.

Example: Studies show that the probability that a newborn is male is   $0.512\;$ .   If seventeen babies are born at a hospital on a specific day, what is the probability that ten are boys?   What is the probability that eleven are girls?

Let   $Y$   denote the experiment of seventeen mutually independent (no twins, for example), identically distributed births with underlying probability that any given birth is a boy being   $0.512\;$ .   The probability that ten of seventeen births are boys is precisely
$$
P(Y=10) =\left( \begin{array}{cc} 17 \\ 10 \end{array} \right)\, (1-.512)^7\, 0.512^{10}
\doteq 0.159 \;\text{.}
$$
The probability that eleven are girls is precisely the probability that six are boys.   This is
$$
P(Y=6) =\left( \begin{array}{cc} 17 \\ 6 \end{array} \right)\, (1-.512)^{11}\, 0.512^6
\doteq 0.083 \;\text{.}
$$

Binomial Densities with Spreadsheets

Binomial densities occur often enough that they are built in to spreadsheet applications.   In the following, we see how to use a spreadsheet to obtain probabilities   $P({\rm Binom}(p;n) =k) = \left( \begin{array}{cc} n \\ k \end{array} \right) \cdot p^k \cdot (1-p)^{n-k}\, ,$   and distribution function values like   $DF({\rm Binom}(p;n); k)) =P({\rm Binom}(p;n) \le k) = \sum_{j=0}^k \left( \begin{array}{cc} n \\ k \end{array} \right) \cdot p^k \cdot (1-p)^{n-k} \,$ .   The command for   $P({\rm Binom}(p;n) =k)$   is a variation on   “BINOM.DIST(k,n,p,0)”.   This is illustrated in B6.   We compare this with direct computation in B7, where   “COMBIN(n,k)”   means   $\left( \begin{array}{cc} n \\ k \end{array} \right)\,$ , and the equality of the two is highlighted.   The command for   $P({\rm Binom}(p;n) \le k)$   is a variation on   “BINOM.DIST(k,n,p,1)”.   This is illustrated in B10.   The difference in the fourth function entry, from 0 to 1, indicates the difference between computing for a specific   $k$   value or computing the sum over all values up to and including   $k\,$ .

Click here to open a copy of this so you can experiment with it. You will need to be signed in to a Google account.

Geometric Densities

If we run a sequence of mutually independent, identically distributed Bernoulli trials until we get a   $1\,$ , the experiment is called geometric.   For example, if we roll a die repeatedly until we get a   $6\,$, or if we flip a coin until we get tails, or if we ask people on the street their handedness until we find a lefty, we have geometric experiments.

The density function of such an experiment is not difficult to determine.   Suppose that the underlying Bernoulli experiment (call it   $X\,$ ) has probability of success   $p\;$ .   Let   $Y$ &  denote our geometric experiment.   Then   $Y=n$   precisely when the none of the first   $n-1$   trials is a success, and then the   $n$-th trial is a success.   If   $\displaystyle{ X_k }$   denotes the   $k$-th trial of   $Y\,$ , then
$$
Y=n \qquad \Longleftrightarrow \qquad
\left( X_1 =0 \right) \cap \left( X_2 =0 \right) \cap \cdots
\cap \left( X_{n-1} =0 \right) \cap \left( X_n =1 \right) \;\text{.}
$$
Thus
$$
\begin{array}{rl} P(Y=n) & =P\left( X_1 =0 \right)\cdot P\left( X_2 =0 \right)\cdot\,\cdots\,\cdot
P\left( X_{n-1} =0 \right)\cdot P\left( X_n =1 \right) \\ & \\ & =(1-p)^{n-1}\,p \end{array} \;\text{.}
$$

Thus we have the density function,
$$ {\rm Geom}(n;p) =(1-p)^{n-1}\,p $$
for   $n=1, 2, \cdots\;$ .

The mean and variance of this density are determined as follows.   The mean is
$$
\begin{array}{rl}
\mu & =1\cdot {\rm Geom}(1;p) +2\cdot {\rm Geom}(2;p) +\cdots \\ & \\
& =1\cdot p +2\cdot (1-p)\, p +\cdots \\ & \\
& =\sum_{k=1}^\infty\, k\cdot (1-p)^{k-1}\, p \\ & \\
& =p\, \sum_{k=1}^\infty\, k\cdot (1-p)^{k-1} \\ & \\
& =p\cdot \frac{1}{p^2} \\ & \\
& =\frac{1}{p}
\end{array} \,\text{,}
$$
so that when   $p$   is small (not much chance of success), the first success happens, on average, after a long time.   The variance is similarly computed using   $\displaystyle{ \sigma^2 =\mu_2 =\mu’_2 -\mu^2 }\,$ , after determining  
$$
\begin{array}{rl}
\mu’_2 & =1^2\cdot {\rm Geom}(1;p) +2^2\cdot {\rm Geom}(2;p) +\cdots \\ & \\
& =1^2\cdot p +2^2\cdot (1-p)\, p +\cdots \\ & \\
& =\sum_{k=1}^\infty\, k^2\cdot (1-p)^{k-1}\, p \\ & \\
& =p\, \sum_{k=1}^\infty\, k^2\cdot (1-p)^{k-1} \\ & \\
& =p\, \left( \frac{2}{p^3} -\frac{1}{p^2} \right)
\end{array} \;\text{.}
$$
Thus
$$ \sigma^2 =\mu_2 =p\, \left( \frac{2}{p^3} -\frac{1}{p^2} \right) -\frac{1}{p^2} =\frac{1-p}{p^2} \;\text{.} $$

Example: Hitting Home-Runs

A star baseball player hits a home-run about once in every twenty-five at-bats.   A baseball playing child idolized this star, an the parents take the child to watch the star play.   What is the probability that the star will hit a home-run in his first at bat?   If the star is at bat four times during the game, what is the probability that the star will hit a home-run during the game?

The probability that the star hits his first home-run in his   $k$-th at bat is
$$ {\rm Geom}(k;0.04) =(1-0.04)^{k-1}\cdot (0.04) \;\text{.} $$
Thus the probability that the star hits his first home-run in his first at bat is
$$ {\rm Geom}(1;0.04) =0.04 \;\text{.} $$
Hitting a home-run during the game means precisely that he hit his first home-run in one of his first four at-bats.   Thus the probability that he hits a home-run during the game is
$$ \sum_{k=1}^4\, {\rm Geom}(k;0.04) =0.04 +(0.96)(0.04) +(0.96)^2(0.04) +(0.96)^3(0.04) \doteq 0.151 \;\text{.} $$
That is, there is about a   $15\%$   chance that the child get to see the star hit a home-run.

Geometric Densities with Spreadsheets

Geometric densities are part of a larger family of densities called Negative Binomial, and are built in to spreadsheet applications.   Here we see how to use a spreadsheet to obtain probabilities   $P({\rm Geom}(p) =k) = (1-p)^{k-1} \cdot p\,$ .   The command is   “NEGBINOM.DIST(k-1,1,p)”   for   $P({\rm Geom}(p) = k) \,$ .   Here the “1” refers to the first success (so “3” in the second place would refer to the third success), “k-1” refers to the number of failures before success number “1”, and “p” refers to the probability of success on each try.   Thus, for example, “NEGBINOM.DIST(5,3,0.4)” would return the probability of having five failures before the third success, when the probability of success on each try is   $0.4\,$ . That the command properly returns the result of the defining computation is highlighted by the equality of B5 (command) and B6 (direct computation).   The distribution function   $DF({\rm Geom}(p); k) =P({\rm Geom}(p) \le k)$   is not included in the command here.   It is necessary to compute it directly, knowing that   $P({\rm Geom}(p) \le k) =\sum_{j=0}^{k-1}\, (1-p)^j \cdot p = 1-(1-p)^k\,$ , as shown in B9.   In some spreadsheet apps this is included by a fourth option in the function, with something like   “NEGBINOM.DIST(k-1,1,p,0)”   for the density function and   “NEGBINOM.DIST(k-1,1,p,1)”   for the distribution function.

Click here to open a copy of this so you can experiment with it. You will need to be signed in to a Google account.

Poisson Densities

Poisson densities describe the number of events occurring in a fixed interval of time, if these events occur at a known average rate, and the occurrence of an event does not influence the waiting time for subsequent events.   Poisson densities can similarly be used for other specified intervals or regions, such as distance, area or volume.   Thus we might use a Poisson density to understand the number of calls received by a service center in an hour, or auto accidents on a stretch of road in a month, or hawks landing on a field in day.

It is a bit beyond what we are after here to derive an exact formula for a Poisson density, given a mathematically precise formulation of the above description.   Instead, we simply give the density function and properties, and follow with an example of its use.

A Poisson density function with parameter   $\lambda$   will be a density function which assigns probability
$$ {\rm Poisson}(n; \lambda) ={\rm e}^{-\lambda}\,\frac{\lambda^n}{n!} $$
to integer   $n\ge 0\;$ .

The mean of   ${\rm Poisson}(n; \lambda)$   is
$$
\begin{array}{rl}
\mu & =0\cdot {\rm Poisson}(0; \lambda) +1\cdot {\rm Poisson}(1; \lambda) + \cdots \\ & \\
& =\sum_0^\infty\, n\cdot {\rm Poisson}(n; \lambda) \\ & \\
& =\sum_0^\infty\, n\cdot {\rm e}^{-\lambda}\,\frac{\lambda^n}{n!} \\ & \\
& ={\rm e}^{-\lambda}\, \sum_1^\infty\,\frac{\lambda^n}{(n-1)!} \\ & \\
& ={\rm e}^{-\lambda}\,\lambda\, \sum_1^\infty\,\frac{\lambda^{n-1}}{(n-1)!} \\ & \\
& ={\rm e}^{-\lambda}\,\lambda\, \sum_0^\infty\,\frac{\lambda^n}{n!} \\ & \\
& ={\rm e}^{-\lambda}\,\lambda\, {\rm e}^{\lambda} \\ & \\
& =\lambda
\end{array} \;\text{.}
$$

Given this, Poisson density functions are often written as
$$ {\rm Poisson}(n; \mu) ={\rm e}^{-\mu}\,\frac{\mu^n}{n!} \,\text{,} $$
(using   $\mu$   in place of our parameter   $\lambda\,$ ), allowing one to simply read off the mean immediately.

The variance of Poisson density functions are similarly determined.   We use  
$\displaystyle{ \sigma^2 =\mu_2 =\mu’_2 -\mu^2 }\,$ , after determining  
$$
\begin{array}{rl}
\mu’_2 & =0^2\cdot {\rm Poisson}(0; \lambda) +1^2\cdot {\rm Poisson}(1; \lambda) + \cdots \\ & \\
& =\sum_0^\infty\, n^2\cdot {\rm Poisson}(n; \lambda) \\ & \\
& =\sum_0^\infty\, n^2\cdot {\rm e}^{-\lambda}\,\frac{\lambda^n}{n!} \\ & \\
& ={\rm e}^{-\lambda}\, \sum_1^\infty\,\frac{n\, \lambda^n}{(n-1)!} \\ & \\
& ={\rm e}^{-\lambda}\,\lambda\, \sum_1^\infty\,\frac{n\, \lambda^{n-1}}{(n-1)!} \\ & \\
& ={\rm e}^{-\lambda}\,\lambda\, \sum_0^\infty\,\frac{(n+1)\, \lambda^n}{n!} \\ & \\
& ={\rm e}^{-\lambda}\,\lambda\,
\left( \sum_0^\infty\,\frac{n\, \lambda^n}{n!} +\sum_0^\infty\,\frac{\lambda^n}{n!} \right) \\ & \\
& ={\rm e}^{-\lambda}\,\lambda\,
\left( \sum_1^\infty\,\frac{\lambda^n}{(n-1)!} +{\rm e}^{\lambda} \right) \\ & \\
& ={\rm e}^{-\lambda}\,\lambda\,
\left( \lambda\, {\rm e}^{\lambda} +{\rm e}^{\lambda} \right) \\ & \\
& =\lambda\, (\lambda +1)
\end{array} \;\text{.}
$$
Thus we have
$$ \sigma^2 =\mu_2 =\lambda\, (\lambda +1) -\lambda^2 =\lambda \;\text{.} $$

The moment generating function for a Poisson random variable can be explicitly computed.

Moment Generating Function

Let   $X$   have density function   ${\rm Poisson}(\lambda)$   for some   $\lambda\gt 0\;$ .   Take some   $t\in\mathbb{R}\,$ , so that
$$
\begin{array}{rl}
M_X(t) & =E\left[ {\rm e}^{t\, X} \right] \\ & \\
& = \sum_{k=0}^\infty \, {\rm e}^{t\, k}\, {\rm e}^{-\lambda}\, \frac{\lambda^k}{k!} \\ & \\
& = {\rm e}^{-\lambda}\,\sum_{k=0}^\infty\, \frac{\left( \lambda\, {\rm e}^t \right)^k}{k!} \\ & \\
& = {\rm e}^{-\lambda}\, {\rm e}^{\lambda\,{\rm e}^t} \\ & \\
& = {\rm e}^{\lambda\,\left( {\rm e}^t -1 \right)} \;\text{.}
\end{array}
$$

Poisson Densities with Spreadsheets

Binomial densities occur often enough that they are built in to spreadsheet applications.   In the following, we see how to use a spreadsheet to obtain probabilities   $P({\rm Binom}(p;n) =k) = \left( \begin{array}{cc} n \\ k \end{array} \right) \cdot p^k \cdot (1-p)^{n-k}\, ,$   and distribution function values like   $DF({\rm Binom}(p;n); k)) =P({\rm Binom}(p;n) \le k) = \sum_{j=0}^k \left( \begin{array}{cc} n \\ k \end{array} \right) \cdot p^k \cdot (1-p)^{n-k} \,$ .   The command for   $P({\rm Binom}(p;n) =k)$   is a variation on   “BINOM.DIST(k,n,p,0)”.   This is illustrated in B6.   We compare this with direct computation in B7, where   “COMBIN(n,k)”   means   $\left( \begin{array}{cc} n \\ k \end{array} \right)\,$ , and the equality of the two is highlighted.   The command for   $P({\rm Binom}(p;n) \le k)$   is a variation on   “BINOM.DIST(k,n,p,1)”.   This is illustrated in B10.   The difference in the fourth function entry, from 0 to 1, indicates the difference between computing for a specific   $k$   value or computing the sum over all values up to and including   $k\,$ .

Click here to open a copy of this so you can experiment with it. You will need to be signed in to a Google account.

Example: If the number of customers entering a shop in its first hour of business each day is modeled by   $ {\rm Poisson}(n; 15) \,$ , what is the probability that on a given day fewer than   $10$   customers will enter in the first hour?   What is the probability that at least   $18$   customers will enter in the first hour?

The probability that fewer than   $10$   customers will enter is precisely the probability that either   $0$   or   $1$   or   …   or   $8$   or   $9$   will enter.   That is (with each term to the nearest   $0.0000001\,$ )
$$
\begin{array}{rl}
P(n\lt 10) & =P(n=0 \quad \text{or} \quad n=1 \quad \text{or} \quad \cdots \quad \text{or} \quad n=9) \\
& =P(n=0) +P(n=1) +\cdots +P(n=9) \\ & \\ & = \sum_{n=0}^9\, {\rm Poisson}(n; 15) \\
& =\sum_{n=0}^9\, {\rm e}^{-15}\,\frac{15^n}{n!} \\
& =0.0000003 +0.0000046 +0.0000344 +0.0001721 +0.0006453 \\
& \qquad +0.0019358 +0.0048395 +0.0103703 +0.0194443 +0.0324072 \\
& =0.0698537 \;\text{.}
\end{array}
$$
That is, there is about a   $7\%$   chance that on a given day fewer than   $10$   customers will enter in the first hour.

The probability that at least   $18$   customers will enter is precisely the probability that either   $18$   or   $19$   or   …   (ad infinitum) will enter.   That is
$$
\begin{array}{rl}
P(n\ge 18) & =P(n=18 \quad \text{or} \quad n=19 \quad \text{or} \quad \cdots ) \\
& \\ & =P(n=18) +P(n=19) +\cdots \\ & \\ & = \sum_{n=18}^\infty\, {\rm Poisson}(n; 15) \\ & \\
& =\sum_{n=18}^\infty\, {\rm e}^{-15}\,\frac{15^n}{n!} \;\text{.}
\end{array}
$$
This is an infinite series which can be summed explicitly.   But we will use here a computational trick which can be accessed under many circumstances.   It is worth knowing.   The idea is that the condition that at least   $18$   customers enter is precisely the complement of the condition that fewer than   $18$   enter.   Thus
$$
\begin{array}{rl}
P(n\ge 18) & =1-P(n\lt 18) \\
& =1 -\sum_{n=0}^{17}\, {\rm e}^{-15}\,\frac{15^n}{n!} \;\text{.} \\
\end{array}
$$
Following the preceding paragraph, we have
$$ P(n\ge 18) =1 -P(n\lt 17) =1-0.7488588 =.2511412 \;\text{.} $$

The following subsection describes a use of the Poisson density to approximate the binomial density under certain reasonable conditions.   As such, because the formula for the Poisson density is more simple than that of the binomial density, it can be used to get good estimates when working by hand or with a calculator.   When using a spreadsheet or statistical software or a computer algebra system, this is almost never used as computers have trivialized the determination of the binomial density.

Poisson Approximation of Binomial

If   $n$   is very large and   $p$   is very small, let   $\mu =n\cdot p\;$ .   For relatively small values of   $k\,$ , $\,{\rm Poisson}(k;\mu)$   is a good approximation to   ${\rm Binom(k;p,n)}\;$ .   (The notions of “very large”, “very small”, “relatively small”, and “good” can all be made precise).

Thus, for example, if   $1.7\%$   of a population carry a genetic marker, and a group of three hundred subjects are tested, the probability that exactly five subjects will carry the marker is
$$ {\rm Binom}(5;.017,300) =\left(\begin{array}{c} 300 \\ 5 \end{array} \right)\, (.017)^5(.983)^{295} \;\text{.} $$
This is not so simple to estimate by hand.   We obtain   $0.1768$   using a calculator or a speadsheet.   But with   $\mu =300\cdot 0.017 =5.1\,$ , we get the simple approximation
$$ {\rm Poisson}(5; 5.1) ={\rm e}^{-5.1}\, \frac{5.1^5}{5!} =.1753 \;\text{.} $$
This is off by less than   $1\%\;$ .