Glossary term
Glossary term
Evaluation and Benchmarks
Confirmation that a system meets the needs of its intended users and stakeholders in its operating environment. It addresses the question of whether the right system was built. Validation is broader than functional testing and should consider user fit, environment, fairness, performance under realistic conditions, and unintended impacts.
The initial evaluation of a model's quality. Validation checks the quality of a model's predictions against the validation set.
Because the validation set differs from the training set, validation helps guard against overfitting.
You might think of evaluating the model against the validation set as the first round of testing and evaluating the model against the test set as the second round of testing.
Under SR 11-7, independent model validation by a separate team is a key control for US federally regulated banks before model use.
FDA's Software as a Medical Device validation guidance applies to AI medical devices with clinical validation requirements.
ISO/IEC 25040 software product quality evaluation provides validation concepts applied to AI in standards including ISO 42001.
Created for this library
An ML platform team requires a validation step on each release so business reviewers see honest model performance.
A risk modeling team's validation framework includes both technical and business metrics so reviewers see both perspectives.
A medical AI team's validation framework includes edge cases that clinicians find most informative for safety review.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License