Glossary term
Glossary term
Safety and Alignment
A governance framework used by advanced AI developers to identify severe-risk categories, define thresholds, evaluate models, assign mitigations, and restrict deployment when risk exceeds acceptable levels. Even for organizations that do not train frontier models, the framework concept is useful for setting severe-risk thresholds and escalation paths.
OpenAI's Preparedness Framework (December 2023, updated 2024) defines four risk categories and Critical capability thresholds triggering deployment restrictions.
Anthropic's Responsible Scaling Policy (September 2023, updated multiple times) is the equivalent framework with ASL levels.
Google DeepMind's Frontier Safety Framework (May 2024) defines Critical Capability Levels and corresponding safety and security mitigations.