Glossary term
Glossary term
Evaluation and Benchmarks
A member of the ROUGE family focused on the length of the longest common subsequence in the reference text and generated text. The following formulas calculate recall and precision for ROUGE-L:
You can then use F1 to roll up ROUGE-L recall and ROUGE-L precision into a single metric:
Click the icon for an example calculation of ROUGE-L.
ROUGE-L ignores any newlines in the reference text and generated text, so the longest common subsequence could cross multiple sentences. When the reference text and generated text involve multiple sentences, a variation of ROUGE-L called ROUGE-Lsum is generally a better metric. ROUGE-Lsum determines the longest common subsequence for each sentence in a passage and then calculates the mean of those longest common subsequences.
Click the icon for an example calculation of ROUGE-Lsum.
Created for this library
A summarization team reports ROUGE-L alongside ROUGE-1 and ROUGE-2 to capture longest common subsequence between generated and reference summaries.
A research team uses ROUGE-L on long-form generation to capture sequence-level similarity that unigram metrics miss.
A news platform reports ROUGE-L weekly to detect drift in its summarization model's structural fidelity.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License