ROUGE-N

A set of metrics within the ROUGE family that compares the shared N-grams of a certain size in the reference text and generated text. For example:

ROUGE-1 measures the number of shared tokens in the reference text and generated text.

ROUGE-2 measures the number of shared bigrams (2-grams) in the reference text and generated text.

ROUGE-3 measures the number of shared trigrams (3-grams) in the reference text and generated text.

You can use the following formulas to calculate ROUGE-N recall and ROUGE-N precision for any member of the ROUGE-N family:

You can then use F1 to roll up ROUGE-N recall and ROUGE-N precision into a single metric:

Click the icon for an example.

Created for this library

1.
A summarization team reports ROUGE-1 and ROUGE-2 to compare model versions on unigram and bigram overlap with reference summaries.
2.
A research team uses ROUGE-N as a baseline metric for summarization evaluation across model versions.
3.
A news platform uses ROUGE-N as part of its weekly summarization quality monitoring.

Loading…