Winograd Schema Challenge (WSC)

A format (or dataset conforming to that format) for evaluating an LLM's ability to determine the noun phrase that a pronoun refers to.

Each entry in a Winograd Schema Challenge consists of:

A short passage, which contains a target pronoun

A target pronoun

Candidate noun phrases, followed by the correct answer (a Boolean). If the target pronoun refers to this candidate, the answer is True. If the target pronoun does not refer to this candidate, the answer is False.

For example:

Passage: Mark told Pete many lies about himself, which Pete included in his book. He should have been more truthful.

Target pronoun: He

Candidate noun phrases:

Mark: True, because the target pronoun refers to Mark

Pete: False, because the target pronoun doesn't refer to Peter

The Winograd Schema Challenge is a component of the SuperGLUE ensemble.

Real-world uses

Created for this library

1.
An LLM evaluation team uses the Winograd Schema Challenge to test commonsense pronoun resolution across model versions.
2.
A research lab reports Winograd Schema Challenge scores in its model card so downstream users can compare commonsense reasoning.
3.
A model release team uses the Winograd Schema Challenge as one of several reasoning benchmarks gating production promotion.

Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License

Back to glossary

A format (or dataset conforming to that format) for evaluating an LLM's ability to determine the noun phrase that a pronoun refers to.

Each entry in a Winograd Schema Challenge consists of:

A short passage, which contains a target pronoun

A target pronoun

For example:

Passage: Mark told Pete many lies about himself, which Pete included in his book. He should have been more truthful.

Target pronoun: He

Candidate noun phrases:

Mark: True, because the target pronoun refers to Mark

Pete: False, because the target pronoun doesn't refer to Peter

The Winograd Schema Challenge is a component of the SuperGLUE ensemble.

Real-world uses

Created for this library

1.
An LLM evaluation team uses the Winograd Schema Challenge to test commonsense pronoun resolution across model versions.
2.
A research lab reports Winograd Schema Challenge scores in its model card so downstream users can compare commonsense reasoning.
3.
A model release team uses the Winograd Schema Challenge as one of several reasoning benchmarks gating production promotion.

Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License

Back to glossary

Real-world uses

Loading…

Real-world uses