In statistics, groups of individual data points may be classified as belonging to any of various statistical data types, e.g. categorical ("red", "blue", "green"), real number (1.68, -5, 1.7e+6), etc. The data type is a fundamental component of the semantic content of the variable, and controls which sorts of probability distributions can logically be used to describe the variable, the permissible operations on the variable, the type of regression analysis used to predict the variable, etc. The concept of data type is similar to the concept of level of measurement, but more specific: For example, count data require a different distribution (e.g. a Poisson distribution or binomial distribution) than non-negative real-valued data require, but both fall under the same level of measurement (a ratio scale).
Various attempts have been made to produce a taxonomy of levels of measurement. The psychophysicist Stanley Smith Stevens
defined nominal, ordinal, interval, and ratio scales. Nominal
measurements do not have meaningful rank order among values, and permit
any one-to-one transformation. Ordinal measurements have imprecise
differences between consecutive values, but have a meaningful order to
those values, and permit any order-preserving transformation. Interval
measurements have meaningful distances between measurements defined, but
the zero value is arbitrary (as in the case with longitude and temperature measurements in Celsius or Fahrenheit),
and permit any linear transformation. Ratio measurements have both a
meaningful zero value and the distances between different measurements
defined, and permit any rescaling transformation.
Because variables conforming only to nominal or ordinal measurements
cannot be reasonably measured numerically, sometimes they are grouped
together as categorical variables, whereas ratio and interval measurements are grouped together as quantitative variables, which can be either discrete or continuous, due to their numerical nature. Such distinctions can often be loosely correlated with data type in computer science, in that dichotomous categorical variables may be represented with the Boolean data type, polytomous categorical variables with arbitrarily assigned integers in the integral data type, and continuous variables with the real data type involving floating point
computation. But the mapping of computer science data types to
statistical data types depends on which categorization of the latter is
Other categorizations have been proposed. For example, Mosteller and Tukey (1977) distinguished grades, ranks, counted fractions, counts, amounts, and balances. Nelder (1990) described continuous counts, continuous ratios, count ratios, and categorical modes of data. See also Chrisman (1998), van den Berg (1991).
The issue of whether or not it is appropriate to apply different
kinds of statistical methods to data obtained from different kinds of
measurement procedures is complicated by issues concerning the
transformation of variables and the precise interpretation of research
questions. "The relationship between the data and what they describe
merely reflects the fact that certain kinds of statistical statements
may have truth values which are not invariant under some
transformations. Whether or not a transformation is sensible to
contemplate depends on the question one is trying to answer" (Hand,
2004, p. 82).