The simple answer to the question at the title of this section is that data (singular = datum) are the measurements from experiments or observations. As such, they can be of a variety of types (which will be discussed) and of various qualities. Data becomes information when it is used to enhance (inform!) our world views.
We will use data in essentially three ways.
- descriptive statistics: clarification of what is by organizing data
- inferential statistics: drawing conclusions about what is from analysis of data
- predictive statistics: anticipating what will be by examining patterns in data
The last two of these are substantially more mathematical in nature and will be the emphasis of this course. They will depend implicitly on the quality of the data accumulated.
Types of Data
We identify data according to how receptive it is to the kind of analytical techniques which we will be developing.
- Nominal – dealing strictly with qualitative data. For example, what type of dog, if any, is owned by people in a town. Or what are the flavors of ice-cream purchased at a shop. This is simply a categorization. The categories can be numerical, as in area codes. But these such numbers are strictly for categorization, not ranking or computation.
- Ordinal – dealing with categorization allowing for rank and order. For example we can record brightness of stars or list American cities by distance from New York. This last indicates that such ordering need not be numerical.
- Interval – refers to numerical data which can be added and subtracted. There is no clear notion of zero with such data, so that we cannot multiply. Consider, for example, temperature on the Celcius scale. The value $0^\circ$ denotes the temperature at which water freezes, and the value $100^\circ$ denotes the temperature at which water boils, and we can talk about a five degree increase in temperature. But $10^\circ$ is not twice as warm as $5^\circ\;$ . As such, a doubling of a datum value does not correspond to a doubling of that which is being measured.
- Ratio – the type of data with which the power of mathematics becomes apparent. We can compute the difference between measurements as with interval data, and this carries meaning in terms of what is being measured. The added feature is that multiplication also has meaning here. For example, it makes sense to say one object has double the speed of another. That is, it covers twice the distance as the second object in a given amount of time.
It is important that we understand the nature of our data, and what is meant by arithmetic operations. The graphic below summarizes how quality of data determines usefulness.
provides: | nominal | ordinal | interval | ratio |
“Counts” or “Frequency of Occurence” | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ |
Mode, Median | $\checkmark$ | $\checkmark$ | $\checkmark$ | |
“order” of values | $\checkmark$ | $\checkmark$ | $\checkmark$ | |
quantifiable difference between values | $\checkmark$ | $\checkmark$ | ||
can add or subtract values | $\checkmark$ | $\checkmark$ | ||
can multiply or divide values | $\checkmark$ | |||
has true “zero” | $\checkmark$ |
A more thorough discussion of data types can be found at https://en.wikipedia.org/wiki/Level_of_measurement