Glossary term
Glossary term
Evaluation and Benchmarks
A subset of the dataset reserved for testing a trained model.
Traditionally, you divide examples in the dataset into the following three distinct subsets:
a test set
Each example in a dataset should belong to only one of the preceding subsets. For instance, a single example shouldn't belong to both the training set and the test set.
The training set and validation set are both closely tied to training a model. Because the test set is only indirectly associated with training, test loss is a less biased, higher quality metric than training loss or validation loss.
See Datasets: Dividing the original dataset in Machine Learning Crash Course for more information.
Created for this library
An ML platform team requires a frozen test set per model release so model quality is comparable across versions.
A medical AI team curates a test set with edge cases that clinicians find most informative for safety review.
A research team holds out a clean test set per benchmark so model performance numbers stay comparable across runs.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License