Glossary term
Glossary term
Infrastructure and Serving
The process of inferring predictions on multiple unlabeled examples divided into smaller subsets ("batches").
Batch inference can take advantage of the parallelization features of accelerator chips. That is, multiple accelerators can simultaneously infer predictions on different batches of unlabeled examples, dramatically increasing the number of inferences per second.
See Production ML systems: Static versus dynamic inference in Machine Learning Crash Course for more information.
Created for this library
A retail recommendation team runs batch inference nightly to precompute next-day product suggestions for every active user.
An insurance company runs batch inference once per quarter to repredict risk scores for the entire book of business.
A subscription business runs batch inference every Monday to refresh churn-risk scores feeding the retention call list.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License