Glossary term
Glossary term
Evaluation and Benchmarks
A format (or dataset conforming to that format) for evaluating an LLM's ability to determine the noun phrase that a pronoun refers to.
Each entry in a Winograd Schema Challenge consists of:
A short passage, which contains a target pronoun
A target pronoun
Candidate noun phrases, followed by the correct answer (a Boolean). If the target pronoun refers to this candidate, the answer is True. If the target pronoun does not refer to this candidate, the answer is False.
For example:
Passage: Mark told Pete many lies about himself, which Pete included in his book. He should have been more truthful.
Target pronoun: He
Candidate noun phrases:
Mark: True, because the target pronoun refers to Mark
Pete: False, because the target pronoun doesn't refer to Peter
The Winograd Schema Challenge is a component of the SuperGLUE ensemble.
Created for this library
An LLM evaluation team uses the Winograd Schema Challenge to test commonsense pronoun resolution across model versions.
A research lab reports Winograd Schema Challenge scores in its model card so downstream users can compare commonsense reasoning.
A model release team uses the Winograd Schema Challenge as one of several reasoning benchmarks gating production promotion.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License