Glossary term
Glossary term
Infrastructure and Serving
The process of making a trained model available to provide predictions through online inference or offline inference.
Created for this library
An ML platform team owns the serving stack that exposes models behind stable APIs for product teams.
A retail recommender team optimizes the serving path for sub-100-millisecond latency on the homepage carousel.
An LLM platform team uses model cascading in its serving stack to keep cost and quality balanced per request.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License