Glossary term
Glossary term
Infrastructure and Serving
A tactic of training a model in a sequence of discrete stages. The goal can be either to speed up the training process, or to achieve better model quality.
An illustration of the progressive stacking approach is shown below:
Stage 1 contains 3 hidden layers, stage 2 contains 6 hidden layers, and stage 3 contains 12 hidden layers.
Stage 2 begins training with the weights learned in the 3 hidden layers of Stage 1. Stage 3 begins training with the weights learned in the 6 hidden layers of Stage 2.

See also pipelining.
Created for this library
A research team uses staged training that begins with self-supervised pretraining and continues with task-specific fine-tuning.
An LLM team uses staged training that combines pretraining, instruction tuning, and RLHF for alignment.
An ML platform team uses staged training where a teacher model is trained first and a smaller student is then distilled in a second stage.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License