Glossary term
Glossary term
Evaluation and Benchmarks
Datasets to evaluate an LLM's ability to answer trivia questions. Each dataset contains question-answer pairs authored by trivia enthusiasts. Different datasets are grounded by different sources, including:
Web search (TriviaQA)
Wikipedia (TriviaQA_wiki)
For more information see TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension.
Created for this library
An LLM evaluation team uses Trivia Question Answering to test factual recall across model versions.
A research lab reports Trivia Question Answering scores in model cards so downstream users can compare factual recall.
A model release team uses Trivia Question Answering as one of several knowledge benchmarks gating production promotion.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License