Glossary term
Glossary term
Evaluation and Benchmarks
A measurement of how often human raters agree when doing a task. If raters disagree, the task instructions may need to be improved. Also sometimes called inter-annotator agreement or inter-rater reliability. See also Cohen's kappa, which is one of the most popular inter-rater agreement measurements.
See Categorical data: Common issues in Machine Learning Crash Course for more information.
Created for this library
A search-quality team monitors inter-rater agreement on relevance judgments so label noise stays under control.
A medical labeling team monitors inter-rater agreement among radiologists to confirm annotation guidelines are interpreted consistently.
A research lab uses inter-rater agreement as a quality gate before accepting a new annotation batch into training data.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License