Reasoning Model

LLM specifically trained or prompted to produce explicit step-by-step reasoning traces before generating a final answer.

1.
OpenAI o1 (September 2024) is trained with reinforcement learning to produce long internal reasoning chains before answering - achieving 83% on AIME 2024 math competition vs 13% for GPT-4o without reasoning.
2.
DeepSeek R1 uses pure RL on reasoning tasks without supervised fine-tuning to emerge reasoning chains - matching o1 performance on MATH and AIME benchmarks at open-source weights and 10x lower training cost.
3.
Google Gemini 2.0 Flash Thinking applies test-time compute scaling with visible reasoning traces - used in Google Search AI Overviews for complex multi-step queries requiring research and synthesis.

Loading…