| |
The goal of the techniques described in this topic is to detect relationships or associations between specific values of categorical variables in large data sets.
A categorical variable is a generalization of the binary variable in that it can take on more than two states. For example, map color is a categorical variable that may have, say, five states: maroon, yellow, orange, pink, and blue.
In a data base, categorical variables are usually represented by their modalities, which are non numerical quantities. For example, the variable "Region" might have four modalities : "America", "Europe", "Asia" and "Africa" (top image).
Distance for Nominal/Categorical Variable In many cases, we cannot measure variable in quantitative way, but it is possible to measure in term of category. A nominal or categorical variable is used when number is only a symbol to represent something.
Because variables conforming only to nominal or ordinal measurements cannot be reasonably measured numerically, sometimes they are grouped together as categorical variables, ...
See also: Data mining, Distribution, Regression, Variance, Classification
 
|