Glossary term
Glossary term
Evaluation and Benchmarks
Abbreviation for Winograd Schema Challenge.
X
Created for this library
An LLM evaluation team uses WSC to test commonsense pronoun resolution across model versions.
A research lab reports WSC scores in its model card so downstream users can compare commonsense reasoning across versions.
A model release team uses WSC as one of several reasoning benchmarks gating production promotion.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License