Glossary term
Glossary term
Infrastructure and Serving
A way of scaling training or inference that replicates an entire model onto multiple devices and then passes a subset of the input data to each device. Data parallelism can enable training and inference on very large batch sizes; however, data parallelism requires that the model be small enough to fit on all devices.
Data parallelism typically speeds training and inference.
See also model parallelism.
Created for this library
An ML platform team uses data parallelism across 64 GPUs to train its image model in hours instead of days.
A speech recognition team uses data parallelism in PyTorch Distributed to scale training of its acoustic model across multiple nodes.
A search team uses data parallelism with synchronous gradient averaging when training its ranker on hundreds of millions of click events.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License