Glossary term
Glossary term
Safety and Alignment
Risk that an AI system can pursue goals, use tools, adapt plans, or take actions with insufficient human control. Autonomy risk increases with agents, long-horizon tasks, memory, and external tool access. Autonomy risk should be assessed by task duration, tool permissions, reversibility of actions, supervision quality, and ability to recover from errors.
OpenAI's Preparedness Framework defines Model Autonomy as a risk category with explicit Critical capability thresholds.
Apollo Research's December 2024 paper on in-context scheming documented frontier model behaviors relevant to autonomy risk.
METR specializes in evaluating autonomous task completion capabilities of frontier models, with results published for major releases.