Glossary term
Glossary term
Safety and Alignment
Rules for deciding whether, when, and how to release a model or capability. It should cover staged rollout, access tiers, safety thresholds, monitoring, user restrictions, incident response, and rollback. Release policy should define who can approve broader access, what evidence is required, and what conditions trigger pause, rollback, or phased deployment.
OpenAI's staged release of GPT-2 in 2019 was an early industry example of phased model release tied to safety considerations.
Anthropic's Responsible Scaling Policy explicitly ties deployment decisions to ASL evaluations and required safeguards.
Meta's Llama license includes acceptable use policy components functioning as deployment conditions for open weight releases.