Glossary term
Glossary term
Evaluation and Benchmarks
A metric to evaluate machine translation models. Character N-gram F-score determines the degree to which N-grams in reference text overlap the N-grams in an ML model's generated text.
Character N-gram F-score is similar to metrics in the ROUGE and BLEU families, except that:
Character N-gram F-score operates on character N-grams.
ROUGE and BLEU operate on word N-grams or tokens.
Created for this library
A translation vendor reports ChrF alongside BLEU for morphologically rich languages because character-level F-score is more sensitive to inflection.
A localization team uses ChrF to compare translation models on agglutinative languages where word-level metrics struggle.
A multilingual search team uses ChrF as a secondary evaluation metric on Finnish and Turkish translations of its product descriptions.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License