Estimation – Introduction to Statistics via Spreadsheets

Many problems in statistics concern estimation of parameters of a probability density function. For example, a call center might wish to estimate the average length of calls in order to understand how to best staff the center. If the length of calls is assumed to generated by an exponential random variable, the call center would be interested in the parameter $\mu$ of the underlying density function. Or if the life-span of a canned fruit is approximated by a normal random variable, the canning company would be concerned with the parameters $\mu$ and $\sigma$ of the associated density function.

Two kinds of parameter estimates are commonly considered. The first is a point estimate, where a number obtained from computations with accumulated data is taken as an approximation to the appropriate parameter. For example, the fraction of times heads occurs in the flipping of a coin is a point estimate for the actual proportion for the coin. An interval estimate is requires two computations with accumulated data, and the interval determined by the two values is expected to contain the true value of the underlying parameter. Interval estimates will be introduced in the next chapter. For now we will consider point estimates.

What we will use here for estimation is called the method of moments. The idea is to use the empirical moments of a data set as approximations of the moments of the underlying probability density function. Thus if we believe that a data set is generated by a probability density function with mean $\mu\,$ , we use $\bar{x}$ as an approximation. Similarly, we use $\displaystyle{ s^2 }$ as an approximation to $\displaystyle{ \sigma^2 }\;$ .

This will generally be effective if our density function is determined by parameters dependent on its moments.

Example: One hundred people are observed at the Department of Motor Vehicles, and identified by gender. With $0$ denoting man and $1$ denoting woman, estimate the proportion of women using the DMV.

We model the gender of a person at the DMV by a Bernoulli random variable. The following data set represents the men and women observed. The average (in cell M1) is the proportion of women observed. We use this as our estimate.

Click here to open a copy of this so you can experiment with it. You will need to be signed in to a Google account.

Example: The number of shoppers entering a retail store in its first hour of business is
recorded daily for seventy days. Assuming that the number of shoppers is Poisson, estimate the
probability of having at least twenty shoppers in the first hour of business and compare with the observed fraction of days with at least twenty shoppers in the first hour of business.

The following data presents the number of shoppers observed over the seventy days. Since the mean determines a Poisson random variable ( $\,\displaystyle{ {\rm Poisson}_\mu }\,$ ), we use $\bar{x}$ as an approximation. This is given in cell D1. Using this, we approximate (D2) the probability that at least twenty shoppers enter in the first hour: $\displaystyle{ P\left( {\rm Poisson_\mu } \ge 20 \right) }\;$ . This is given obtained using the command “=1-POISSONCDF(D1;19)”. Finally, we compare this with the observed fraction of days with at least twenty shoppers in the first hour of business (D3).

Click here to open a copy of this so you can experiment with it. You will need to be signed in to a Google account.