Glossary term
Glossary term
Evaluation and Benchmarks
A dataset for evaluating an LLM's proficiency in summarizing text. XL-Sum provides entries in many languages. Each entry in the dataset contains:
An article, taken from the British Broadcasting Company (BBC).
A summary of the article, written by the article's author. Note that that summary can contain words or phrases not present in the article.
For details, see XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages.
Created for this library
An LLM evaluation team uses XL-Sum to measure cross-lingual summarization quality across many languages.
A research lab reports XL-Sum scores in its model card so downstream users can compare multilingual summarization.
A multilingual NLP team uses XL-Sum as one of several benchmarks for cross-lingual generation quality.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License