Glossary term
Glossary term
Evaluation and Benchmarks
A large dataset for evaluating an LLM's proficiency in answering questions. The dataset contains question and answer pairs in many languages.
For details, see TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages.
U
Created for this library
An LLM evaluation team uses TyDi QA to measure question answering across typologically diverse languages.
A multilingual NLP team reports TyDi QA scores in its model card to communicate cross-language performance.
A research lab uses TyDi QA in its multilingual benchmark suite to ensure language coverage in evaluation.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License