Dangerous Capability

A model capability that could enable severe harm if misused or poorly controlled, such as advanced cyber offense, biological design assistance, scalable deception, or autonomous execution of harmful tasks. Dangerous-capability findings should drive access restrictions, additional safeguards, leadership review, and sometimes non-deployment.

Examples

1.
Anthropic's Responsible Scaling Policy explicitly defines ASL thresholds for dangerous capabilities including biological weapons uplift and autonomous self-replication.
2.
OpenAI's Preparedness Framework risk categories of CBRN, Cybersecurity, Persuasion, and Model Autonomy each define dangerous capability bands.
3.
Apollo Research published evaluations of in-context scheming behaviors in frontier models in December 2024, documenting nascent dangerous capabilities.

Related terms

Back to glossary

Examples

Related terms

Loading…

Examples

Related terms