Glossary term
Glossary term
Infrastructure and Serving
A logical division of the training set or the model. Typically, some process creates shards by dividing the examples or parameters into (usually) equal-sized chunks. Each shard is then assigned to a different machine.
Sharding a model is called model parallelism; sharding data is called data parallelism.
Created for this library
An ML platform team shards its embedding table across multiple devices to fit a large vocabulary in distributed training.
A research engineer shards model parameters across a device mesh to scale training of a very large transformer.
An ML platform team shards its dataset across workers so distributed training reads each example exactly once per epoch.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License