Staged Training

A tactic of training a model in a sequence of discrete stages. The goal can be either to speed up the training process, or to achieve better model quality.

An illustration of the progressive stacking approach is shown below:

Stage 1 contains 3 hidden layers, stage 2 contains 6 hidden layers, and stage 3 contains 12 hidden layers.

Stage 2 begins training with the weights learned in the 3 hidden layers of Stage 1. Stage 3 begins training with the weights learned in the 6 hidden layers of Stage 2.

Real-world uses

Created for this library

1.
A research team uses staged training that begins with self-supervised pretraining and continues with task-specific fine-tuning.
2.
An LLM team uses staged training that combines pretraining, instruction tuning, and RLHF for alignment.
3.
An ML platform team uses staged training where a teacher model is trained first and a smaller student is then distilled in a second stage.

Back to glossary

Staged Training

Real-world uses

Related terms

Loading…

Staged Training

Real-world uses

Related terms