Glossary term
Glossary term
Evaluation and Benchmarks
Values distant from most other values. In machine learning, any of the following are outliers:
Input data whose values are more than roughly 3 standard deviations from the mean.
Weights with high absolute values.
Predicted values relatively far away from the actual values.
For example, suppose that widget-price is a feature of a certain model. Assume that the mean widget-price is 7 Euros with a standard deviation of 1 Euro. Examples containing a widget-price of 12 Euros or 2 Euros would therefore be considered outliers because each of those prices is five standard deviations from the mean.
Outliers are often caused by typos or other input mistakes. In other cases, outliers aren't mistakes; after all, values five standard deviations away from the mean are rare but hardly impossible.
Outliers often cause problems in model training. Clipping is one way of managing outliers.
See Working with numerical data in Machine Learning Crash Course for more information.
For example, suppose that widget-price is a feature of a certain model. Assume that the mean widget-price is 7 Euros with a standard deviation of 1 Euro. Examples containing a widget-price of 12 Euros or 2 Euros would therefore be considered outliers because each of those prices is five standard deviations from the mean.
Outliers are often caused by typos or other input mistakes. In other cases, outliers aren't mistakes; after all, values five standard deviations away from the mean are rare but hardly impossible.
Outliers often cause problems in model training. Clipping is one way of managing outliers.
Created for this library
A pricing team caps extreme outliers in its training data so a few unusual transactions do not skew model coefficients.
A risk modeling team winsorizes outliers in tail-heavy features so the production scorecard stays stable across vintages.
A retail forecasting team investigates outlier weeks before retraining to confirm they reflect real events rather than data quality issues.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License