Undersampling

Removing examples from the majority class in a class-imbalanced dataset in order to create a more balanced training set.

For example, consider a dataset in which the ratio of the majority class to the minority class is 20:1. To overcome this class imbalance, you could create a training set consisting of all of the minority class examples but only a tenth of the majority class examples, which would create a training-set class ratio of 2:1. Thanks to undersampling, this more balanced training set might produce a better model. Alternatively, this more balanced training set might contain insufficient examples to train an effective model.

Contrast with oversampling.

Examples

1.
For example, consider a dataset in which the ratio of the majority class to the minority class is 20:1. To overcome this class imbalance, you could create a training set consisting of all of the minority class examples but only a tenth of the majority class examples, which would create a training-set class ratio of 2:1. Thanks to undersampling, this more balanced training set might produce a better model. Alternatively, this more balanced training set might contain insufficient examples to train an effective model.
2.
Contrast with oversampling.

Real-world uses

Created for this library

1.
A fraud team uses undersampling on the dominant non-fraud class during training to balance the gradient signal.
2.
A retention team uses undersampling of retained customers to make the gradient less dominated by majority examples.
3.
A medical screening team uses undersampling on healthy cases during training of a rare-disease detector.

Back to glossary

Removing examples from the majority class in a class-imbalanced dataset in order to create a more balanced training set.

Contrast with oversampling.

Examples

1.
For example, consider a dataset in which the ratio of the majority class to the minority class is 20:1. To overcome this class imbalance, you could create a training set consisting of all of the minority class examples but only a tenth of the majority class examples, which would create a training-set class ratio of 2:1. Thanks to undersampling, this more balanced training set might produce a better model. Alternatively, this more balanced training set might contain insufficient examples to train an effective model.
2.
Contrast with oversampling.

Real-world uses

Created for this library

1.
A fraud team uses undersampling on the dominant non-fraud class during training to balance the gradient signal.
2.
A retention team uses undersampling of retained customers to make the gradient less dominated by majority examples.
3.
A medical screening team uses undersampling on healthy cases during training of a rare-disease detector.

Back to glossary

Undersampling

Examples

Real-world uses

Related terms

Loading…

Undersampling

Examples

Real-world uses

Related terms