Glossary term
Glossary term
Evaluation and Benchmarks
A dataset for evaluating an LLM's ability to summarize a single document. Each entry in the dataset consists of:
A document authored by the British Broadcasting Corporation (BBC).
A one-sentence summary of that document.
For details, see Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization.
F
Created for this library
An LLM evaluation team includes Extreme Summarization in its benchmark suite to measure abstractive summarization quality on news articles.
A research lab reports xsum ROUGE scores in model cards so downstream users can compare abstractive summarization quality.
A summarization product team uses xsum as one of several benchmarks when selecting a base model for its meeting-notes feature.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License