Glossary term
Glossary term
Safety and Alignment
A predefined capability, risk, or assurance threshold that must be satisfied before an AI system can be deployed, expanded, or given broader access. Thresholds make risk appetite operational and should be measurable enough to guide decisions, such as minimum eval scores, unresolved critical findings, incident readiness, or human oversight capacity.
OpenAI's Preparedness Framework treats Critical capability ratings as a deployment block until mitigations are implemented and approved.
Anthropic's RSP commits to ASL-3 safeguards and pause-and-evaluate protocols when ASL-3 evaluations trigger.
Under SR 11-7, US banks set deployment thresholds requiring satisfactory validation findings and senior approval before model production use.