Glossary term
Glossary term
Evaluation and Benchmarks
A dataset for evaluating how well an LLM uses context to understand words that have multiple meanings. Each entry in the dataset contains:
Two sentences, each containing the target word
The target word
The correct answer (a Boolean), where:
True means the target word has the same meaning in the two sentences
False means the target word has a different meaning in the two sentences
For example:
Two sentences:
There's a lot of trash on the bed of the river.
I keep a glass of water next to my bed when I sleep.
The target word: bed
Correct answer: False, because the target word has a different meaning in the two sentences.
For details, see WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations.
Words in Context is a component of the SuperGLUE ensemble.
Created for this library
An LLM evaluation team uses Words in Context to test word-sense disambiguation across model versions.
A research lab reports Words in Context scores in its model card so downstream users can compare lexical reasoning across versions.
A model release team includes Words in Context in its standard benchmark suite to detect regressions in word-sense disambiguation.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License