Boolean Questions (BoolQ)

A dataset for evaluating an LLM's proficiency in answering yes-or-no questions. Each of the challenges in the dataset has three components:

A query

A passage implying the answer to the query.

The correct answer, which is either yes or no.

For example:

Query: Are there any nuclear power plants in Michigan?

Passage: ...three nuclear power plants supply Michigan with about 30% of its electricity.

Correct answer: Yes

Researchers gathered the questions from anonymized, aggregated Google Search queries and then used Wikipedia pages to ground the information.

For more information, see BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions.

BoolQ is a component of the SuperGLUE ensemble.

Created for this library

1.
A research team uses the BoolQ benchmark to evaluate yes-or-no question answering as part of a model release readiness check.
2.
An LLM evaluation team includes BoolQ scores in its model card to give downstream developers a quick view of reading comprehension on yes-no questions.
3.
A vendor benchmarks its open-weights LLM on BoolQ in its model release notes so enterprise buyers can compare reasoning quality between checkpoints.