Glossary term
Glossary term
Training and Fine-Tuning
Data that captures the state of a model's parameters either during training or after training is completed. For example, during training, you can:
Stop training, perhaps intentionally or perhaps as the result of certain errors.
Capture the checkpoint.
Later, reload the checkpoint, possibly on different hardware.
Restart training.
Created for this library
An LLM team saves a model checkpoint every 1,000 training steps so the run can resume from the latest checkpoint if a node fails.
A computer vision team keeps the top three checkpoints by validation loss so it can promote the best one to production after the full run completes.
A speech recognition vendor archives checkpoints across model versions so it can A/B test older candidates against new releases.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License