Glossary term
Glossary term
Agentic Systems
Model capability to produce long internal reasoning traces before generating a response, visible to the user.
Anthropic released Claude 3.7 Sonnet with extended thinking in February 2025 - the model generates up to 64,000 thinking tokens before responding, achieving 80.0% on AIME 2025 and 96.2% on MATH.
OpenAI o3 uses extended thinking with adaptive compute budgets - the model allocates more thinking tokens to harder problems, achieving 87.5% on ARC-AGI and 25.2% on FrontierMath.
Google Gemini 2.0 Flash Thinking shows reasoning traces to users, enabling debugging of the model's reasoning process - used by educational platforms to teach students problem-solving methodology.