BLEU (Bilingual Evaluation Understudy)

A metric between 0.0 and 1.0 for evaluating machine translations, for example, from Spanish to Japanese.

To calculate a score, BLEU typically compares an ML model's translation (generated text) to a human expert's translation (reference text). The degree to which N-grams in the generated text and reference text match determines the BLEU score.

The original paper on this metric is BLEU: a Method for Automatic Evaluation of Machine Translation.

Real-world uses

Created for this library

1.
A translation vendor uses BLEU as a fast offline metric to compare candidate models before paying for human translator evaluation on key language pairs.
2.
A localization team at a software company tracks BLEU score weekly on a fixed test set to detect translation quality drift after each model update.
3.
A multilingual customer support team uses BLEU on machine-translated agent replies to monitor quality across markets between human review cycles.

Back to glossary

BLEU (Bilingual Evaluation Understudy)

Real-world uses

Related terms

Loading…

BLEU (Bilingual Evaluation Understudy)

Real-world uses

Related terms