Glossary term
Glossary term
Safety and Alignment
A formal assessment of whether an AI model or system meets defined safety expectations before release or expansion. Safety evaluations may include red teaming, misuse testing, robustness testing, policy compliance, and autonomy checks. Safety evaluation should include both provider-level model testing and application-level testing.
The UK AI Safety Institute and US AI Safety Institute have pre-release evaluation arrangements with OpenAI, Anthropic, and Google DeepMind.
OpenAI's o1 System Card and Anthropic's Claude 3.5 Sonnet evaluations documented safety testing across categories including jailbreak resistance and dangerous content refusal.
Apollo Research, METR, and Redwood Research publish independent safety evaluations of frontier models.