Glossary term
Glossary term
Infrastructure and Serving
A copy (or part of) of a training set or model, typically stored on another machine. For example, a system could use the following strategy for implementing data parallelism:
Place replicas of an existing model on multiple machines.
Send different subsets of the training set to each replica.
Aggregate the parameter updates.
A replica can also refer to another copy of an inference server. Increasing the number of replicas increases the number of requests that the system can serve simultaneously but also increases serving costs.
Created for this library
An ML platform team uses multiple model replicas behind a load balancer to scale inference throughput for a high-traffic feature.
A research team trains with synchronous replicas across nodes so gradients are averaged before each update.
An ML platform team provisions extra replicas during peak hours so production inference latency stays within SLA.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License