Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD)

A dataset to evaluate an LLM's ability to perform commonsense reasoning. Each example in the dataset contains three components:

A paragraph or two from a news article

A query in which one of the entities explicitly or implicitly identified in the passage is masked.

The answer (the name of the entity that belongs in the mask)

See ReCoRD for an extensive list of examples.

ReCoRD is a component of the SuperGLUE ensemble.

Created for this library

1.
An LLM evaluation team includes ReCoRD in its standard reasoning benchmark suite to test commonsense reading comprehension.
2.
A research lab reports ReCoRD scores in model cards so downstream users can compare commonsense reasoning across model versions.
3.
A model release team uses ReCoRD as one of several reading-comprehension benchmarks gating production promotion.

Loading…