Glossary term
Glossary term
Training and Fine-Tuning
A dataset containing categorical labels in which the number of instances of each category is approximately equal. For example, consider a botanical dataset whose binary label can be either native plant or nonnative plant:
A dataset with 515 native plants and 485 nonnative plants is a class-balanced dataset.
A dataset with 875 native plants and 125 nonnative plants is a class-imbalanced dataset.
A formal dividing line between class-balanced datasets and class-imbalanced datasets doesn't exist. The distinction only becomes important when a model trained on a highly class-imbalanced dataset can't converge. See Datasets: imbalanced datasets in Machine Learning Crash Course for details.
Created for this library
A medical imaging startup builds a class-balanced dataset for its rare-disease classifier so the model sees enough positive examples during training.
A fraud team augments its dataset with synthetic positive cases to create a class-balanced training set for its rare-event model.
A retail recall team uses class-balanced sampling during training so the small number of recall events influences the model proportionally.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License