Glossary term
Glossary term
Foundations
The subset of the dataset used to train a model.
Traditionally, examples in the dataset are divided into the following three distinct subsets:
a training set
a test set
Ideally, each example in the dataset should belong to only one of the preceding subsets. For example, a single example shouldn't belong to both the training set and the validation set.
See Datasets: Dividing the original dataset in Machine Learning Crash Course for more information.
Created for this library
A risk modeling team locks the training set at the start of each release so reviewers can audit exactly which records influenced the production model.
A retail forecasting team uses a rolling training set so each retrain includes the latest weeks of sales data.
A medical AI team versions its training set with content hashes so models can be retrained from the exact same input bytes if needed.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License