BLEURT (Bilingual Evaluation Understudy from Transformers)

A metric for evaluating machine translations from one language to another, particularly to and from English.

For translations to and from English, BLEURT aligns more closely to human ratings than BLEU. Unlike BLEU, BLEURT emphasizes semantic (meaning) similarities and can accommodate paraphrasing.

BLEURT relies on a pre-trained large language model (BERT to be exact) that is then fine-tuned on text from human translators.

The original paper on this metric is BLEURT: Learning Robust Metrics for Text Generation.

Real-world uses

Created for this library

1.
A translation vendor adopts BLEURT alongside BLEU because BLEURT better captures meaning preservation on free-form translations of marketing copy.
2.
A multilingual search team uses BLEURT to score paraphrased query rewrites against the original intent across European languages.
3.
A localization team reports BLEURT as a complementary metric to BLEU when comparing translation models for natural-sounding customer-facing strings.

Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License

Back to glossary

Real-world uses

Loading…

Real-world uses