Side-by-Side Evaluation

Comparing the quality of two models by judging their responses to the same prompt. For example, suppose the following prompt is given to two different models:

Create an image of a cute dog juggling three balls.

In a side-by-side evaluation, a rater would pick which image was "better" (More accurate? More beautiful? Cuter?).

Real-world uses

Created for this library

1.
An LLM team runs side-by-side evaluation where raters compare two candidate models on the same prompts and pick the preferred response.
2.
A search-quality team uses side-by-side evaluation on rated queries to compare two rankers in a controlled review.
3.
A translation team uses side-by-side evaluation by professional translators to compare two model variants before production rollout.

Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License

Back to glossary

Real-world uses

Loading…

Real-world uses