Glossary term
Glossary term
Evaluation and Benchmarks
A dataset for evaluating an LLM's proficiency in determining whether the author of a passage believes a target clause within that passage. Each entry in the dataset contains:
A passage
A target clause within that passage
A Boolean value indicating whether the passage's author believes the target clause
For example:
Passage: What fun to hear Artemis laugh. She's such a serious child. I didn't know she had a sense of humor.
Target clause: she had a sense of humor
Boolean: True, which means the author believes the target clause
CommitmentBank is a component of the SuperGLUE ensemble.
Created for this library
An LLM evaluation team includes CommitmentBank in its standard benchmark suite to test how well models identify embedded commitments.
A research lab reports CommitmentBank scores in its model card so downstream users can compare entailment-style reasoning across model versions.
A model release team uses CommitmentBank as one of several reading-comprehension benchmarks to gate model promotions.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License