Home (Outlier)
Home  
 
 
Home » Artificial Intelligence » Outlier


 

Outlier

Artificial Intelligence Ordinal variableOverall goal

Local Outlier Factor (LOF) is an anomaly detection algorithm presented as "LOF: Identifying Density-based Local Outliers" by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander[1].

 


Outlier detection is important for effective modeling. Outliers should be excluded from such model fitting. If all the data here are included in a linear regression, then the fitted model will be poor virtually everywhere.

Outliers. Outliers are atypical (by definition), infrequent observations.

What is Outlier Analysis?
Very often, there exist data objects that do not comply with the general behavior or model of the data.

Outlier
An "outlier" (or "extreme point") is an observation that is very different from the "average" observation in your data set.
An outlier may have two different possible origins : ...

outlier
A data item whose value falls outside the bounds enclosing most of the other corresponding values in the sample. May indicate anomalous data. Should be examined carefully; may carry important information.
parallel processing ...

neural networks (ANN) modeling technique for data sets containing outlier
measurements using a study on mixture properties of directly compressed dosage
forms, Eur. J. pharm. Sci, 7, 1998, 17-28.
Božic.D.Z, Vrečer. F, Kozjek.

Robust Regression and Outlier Detection for non-linear Models using Genetic Algorithms, Chemometrics and Intelligent Laboratory Systems 28 (1995) 73-87
Lucasius C.B., Kateman G., Understanding and Using Genetic Algorithms, Part 1.

Sometimes a set of numbers (the data) might be contaminated by inaccurate outliers, i.e. values which are much too low or much too high. In this case one can use a truncated mean.

Robust regression A regression model which includes the possibility of measurement outliers. It also refers to parameter estimation that can handle outliers.

Thus, median is robust statistic of central tendency against outlier data. Though median is more robust than mean, people still like to use mean because it is easier to compute mean than median.

Missing values (blanks, spaces, nulls)
Outlier values
Collinearity assessment (related to correlations between predictor variables)
Frequencies of multiple codes in a given variable; ...

for homogeneous groups (data reduction), in finding "natural clusters" and describe their unknown properties ("natural" data types), in finding useful and suitable groupings ("useful" data classes) or in finding unusual data objects (outlier ...

See also: Distribution, Standard Deviation, Regression, Histogram, Data mining

Artificial Intelligence Ordinal variableOverall goal

 
 rssRSS