Glossary term
Glossary term
Infrastructure and Serving
A form of model parallelism in which a model's processing is divided into consecutive stages and each stage is executed on a different device. While a stage is processing one batch, the preceding stage can work on the next batch.
See also staged training.
Created for this library
An ML platform team uses pipelining across stages of training so different micro-batches overlap on different parts of the model.
A research lab uses pipelining when training very large language models so different layers process different micro-batches simultaneously.
An ML engineer uses pipelining to keep accelerators busy while data loading and preprocessing happen on the host.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License