Glossary term
Glossary term
Agentic Systems
Using a verifier model to check each reasoning step rather than only the final answer, enabling more reliable reasoning.
OpenAI's Math Shepherd uses step-level verification with a trained verifier to score each step of chain-of-thought math solutions - improving GSM8K accuracy from 77% to 88.3% over outcome-only verification.
AlphaCode 2 uses execution-based step-level verification - each candidate code solution is run against hidden test cases, with the execution result used as a binary correctness signal for RLHF.
Microsoft's WizardMath uses evolutionary instruction tuning with step-level reward signals - training Llama 2 to produce reliably correct multi-step math reasoning by penalising each incorrect intermediate step.