Glossary term
Glossary term
Agentic Systems
A hybrid mechanism for judging the quality of a generative AI model's output that combines human evaluation with automatic evaluation. An autorater is an ML model trained on data created by human evaluation. Ideally, an autorater learns to mimic a human evaluator.
Prebuilt autoraters are available, but the best autoraters are fine-tuned specifically to the task you are evaluating.
Note: A running autorater is a fully automated process; humans "only" provide data that helps train an autorater.
Created for this library
An LLM team uses autorater evaluation with a strong model as judge so it can score thousands of generated answers in hours instead of weeks.
A search ranking team runs autorater evaluation on top-10 query results to track quality drift between scheduled human rating cycles.
A customer-support chatbot team uses autorater evaluation to flag conversations where the bot's response would likely be rated poorly by a human reviewer.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License