Data Parallelism

A way of scaling training or inference that replicates an entire model onto multiple devices and then passes a subset of the input data to each device. Data parallelism can enable training and inference on very large batch sizes; however, data parallelism requires that the model be small enough to fit on all devices.

Data parallelism typically speeds training and inference.

Real-world uses

Created for this library

1.
An ML platform team uses data parallelism across 64 GPUs to train its image model in hours instead of days.
2.
A speech recognition team uses data parallelism in PyTorch Distributed to scale training of its acoustic model across multiple nodes.
3.
A search team uses data parallelism with synchronous gradient averaging when training its ranker on hundreds of millions of click events.

Back to glossary

Data Parallelism

Real-world uses

Related terms

Loading…

Data Parallelism

Real-world uses

Related terms