Glossary term
Glossary term
Evaluation and Benchmarks
A technique for evaluating the importance of a feature or component by temporarily removing it from a model. You then retrain the model without that feature or component, and if the retrained model performs significantly worse, then the removed feature or component was likely important.
For example, suppose you train a classification model on 10 features and achieve 88% precision on the test set. To check the importance of the first feature, you can retrain the model using only the nine other features. If the retrained model performs significantly worse (for instance, 55% precision), then the removed feature was probably important. Conversely, if the retrained model performs equally well, then that feature was probably not that important.
Ablation can also help determine the importance of:
Larger components, such as an entire subsystem of a larger ML system
Processes or techniques, such as a data preprocessing step
In both cases, you would observe how the system's performance changes (or doesn't change) after you've removed the component.
For example, suppose you train a classification model on 10 features and achieve 88% precision on the test set. To check the importance of the first feature, you can retrain the model using only the nine other features. If the retrained model performs significantly worse (for instance, 55% precision), then the removed feature was probably important. Conversely, if the retrained model performs equally well, then that feature was probably not that important.
Ablation can also help determine the importance of:
Larger components, such as an entire subsystem of a larger ML system
Created for this library
A streaming service's ranking team runs ablation studies on its recommender to confirm that recently added watch-history features actually drive engagement before justifying the added serving cost.
A bank's credit risk group ablates each input data source in its default model to identify which third-party feeds are worth the licensing fees.
A robotics startup removes individual sensor inputs during testing to confirm the perception model still meets safety thresholds when a sensor degrades in the field.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License