Glossary term
Glossary term
Training and Fine-Tuning
Removing examples from the majority class in a class-imbalanced dataset in order to create a more balanced training set.
For example, consider a dataset in which the ratio of the majority class to the minority class is 20:1. To overcome this class imbalance, you could create a training set consisting of all of the minority class examples but only a tenth of the majority class examples, which would create a training-set class ratio of 2:1. Thanks to undersampling, this more balanced training set might produce a better model. Alternatively, this more balanced training set might contain insufficient examples to train an effective model.
Contrast with oversampling.
For example, consider a dataset in which the ratio of the majority class to the minority class is 20:1. To overcome this class imbalance, you could create a training set consisting of all of the minority class examples but only a tenth of the majority class examples, which would create a training-set class ratio of 2:1. Thanks to undersampling, this more balanced training set might produce a better model. Alternatively, this more balanced training set might contain insufficient examples to train an effective model.
Contrast with oversampling.
Created for this library
A fraud team uses undersampling on the dominant non-fraud class during training to balance the gradient signal.
A retention team uses undersampling of retained customers to make the gradient less dominated by majority examples.
A medical screening team uses undersampling on healthy cases during training of a rare-disease detector.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License