Agentic AI Library

Curated Open Source Library

Start Here Library Glossary About the CreatorRoadmap Provide Feedback

Glossary term

Loading…

Home/Glossary/Serving

Infrastructure and Serving

Serving

The process of making a trained model available to provide predictions through online inference or offline inference.

Real-world uses

Created for this library

1.
An ML platform team owns the serving stack that exposes models behind stable APIs for product teams.
2.
A retail recommender team optimizes the serving path for sub-100-millisecond latency on the homepage carousel.
3.
An LLM platform team uses model cascading in its serving stack to keep cost and quality balanced per request.

Related terms

Offline Inference Online Inference

Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License

Back to glossary

Agentic AI LibraryOpen Source · Last Reviewed 2026-06-07

Library About the Creator Roadmap PrivacyProvide Feedback LinkedIn Author Portfolio

All Rights Reserved @2026 Georgi Naydenov