Autorater Evaluation

A hybrid mechanism for judging the quality of a generative AI model's output that combines human evaluation with automatic evaluation. An autorater is an ML model trained on data created by human evaluation. Ideally, an autorater learns to mimic a human evaluator.

Prebuilt autoraters are available, but the best autoraters are fine-tuned specifically to the task you are evaluating.

Note: A running autorater is a fully automated process; humans "only" provide data that helps train an autorater.

Real-world uses

Created for this library

1.
An LLM team uses autorater evaluation with a strong model as judge so it can score thousands of generated answers in hours instead of weeks.
2.
A search ranking team runs autorater evaluation on top-10 query results to track quality drift between scheduled human rating cycles.
3.
A customer-support chatbot team uses autorater evaluation to flag conversations where the bot's response would likely be rated poorly by a human reviewer.

Back to glossary

Autorater Evaluation

Real-world uses

Related terms

Loading…

Autorater Evaluation

Real-world uses

Related terms