Recognizing Textual Entailment (RTE)

A dataset for evaluating an LLM's ability to determine whether a hypothesis can be entailed (logically drawn) from a text passage. Each example in an RTE evaluation consists of three parts:

A passage, typically from news or Wikipedia articles

A hypothesis

The correct answer, which is either:

True, meaning the hypothesis can be entailed from the passage

False, meaning the hypothesis can't be entailed from the passage

For example:

Passage: The Euro is the currency of the European Union.

Hypothesis: France uses the Euro as currency.

Entailment: True, because France is part of the European Union.

RTE is a component of the SuperGLUE ensemble.

Real-world uses

Created for this library

1.
An LLM evaluation team includes RTE in its benchmark suite to measure textual entailment ability across model versions.
2.
A research lab reports RTE scores in model cards so downstream users can compare reasoning ability across model versions.
3.
A model release team uses RTE as one of several reasoning benchmarks to gate production promotion.

Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License

Back to glossary

A dataset for evaluating an LLM's ability to determine whether a hypothesis can be entailed (logically drawn) from a text passage. Each example in an RTE evaluation consists of three parts:

A passage, typically from news or Wikipedia articles

A hypothesis

The correct answer, which is either:

True, meaning the hypothesis can be entailed from the passage

False, meaning the hypothesis can't be entailed from the passage

For example:

Passage: The Euro is the currency of the European Union.

Hypothesis: France uses the Euro as currency.

Entailment: True, because France is part of the European Union.

RTE is a component of the SuperGLUE ensemble.

Real-world uses

Created for this library

1.
An LLM evaluation team includes RTE in its benchmark suite to measure textual entailment ability across model versions.
2.
A research lab reports RTE scores in model cards so downstream users can compare reasoning ability across model versions.
3.
A model release team uses RTE as one of several reasoning benchmarks to gate production promotion.

Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License

Back to glossary

Real-world uses

Loading…

Real-world uses