Glossary term
Glossary term
Foundations
A dataset to evaluate an LLM's ability to answer multiple choice exercises. Each example in the dataset contains:
A context paragraph
A question about that paragraph
Multiple answers to the question. Each answer is labeled True or False. Multiple answers may be True.
For example:
Context paragraph:
Susan wanted to have a birthday party. She called all of her friends. She has five friends. Her mom said that Susan can invite them all to the party. Her first friend couldn't go to the party because she was sick. Her second friend was going out of town. Her third friend was not so sure if her parents would let her. The fourth friend said maybe. The fifth friend could go to the party for sure. Susan was a little sad. On the day of the party, all five friends showed up. Each friend had a present for Susan. Susan was happy and sent each friend a thank you card the next week.
Question: Did Susan's sick friend recover?
Multiple answers:
Yes, she recovered. (True)
No. (False)
Yes. (True)
No, she didn't recover. (False)
Yes, she was at Susan's party. (True)
MultiRC is a component of the SuperGLUE ensemble.
For details, see Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences.
Created for this library
An LLM evaluation team uses MultiRC to test reading comprehension that requires reasoning across several sentences.
A research lab reports MultiRC results in model cards so downstream users can compare multi-sentence reasoning across model versions.
A model release team includes MultiRC in its standard benchmark suite as a check for long-form reading comprehension.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License