Glossary term
Glossary term
Foundations
Generating predictions on demand. For example, suppose an app passes input to a model and issues a request for a prediction. A system using online inference responds to the request by running the model (and returning the prediction to the app).
Contrast with offline inference.
See Production ML systems: Static versus dynamic inference in Machine Learning Crash Course for more information.
Created for this library
A SaaS company runs online inference for its in-product assistant so every keystroke or click receives a fresh prediction.
A fraud team runs online inference at authorization time so transactions are scored within milliseconds.
A search team runs online inference per query so each result list reflects the latest user context and freshest model.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License