Glossary term
Glossary term
Evaluation and Benchmarks
Abbreviation for Recognizing Textual Entailment.
S
Created for this library
An LLM evaluation team uses RTE in its benchmark suite to measure textual entailment ability across model versions.
A research lab reports RTE scores in model cards so downstream users can compare reasoning ability across versions.
A model release team uses RTE as one of several reasoning benchmarks gating production promotion.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License