Human Evaluation

A process in which people judge the quality of an ML model's output; for example, having bilingual people judge the quality of an ML translation model. Human evaluation is particularly useful for judging models that have no one right answer.

Contrast with automatic evaluation and autorater evaluation.

Real-world uses

Created for this library

1.
A search-quality team runs human evaluation each release on a curated set of queries to validate offline metrics before launching.
2.
An LLM product team runs human evaluation by paid raters on a sample of generated answers before promoting any prompt change.
3.
A translation vendor runs human evaluation by professional translators on a sample of pairs before standardizing on a new model.

Back to glossary

Human Evaluation

Real-world uses

Related terms

Loading…

Human Evaluation

Real-world uses

Related terms