Test-Time Compute

Scaling inference compute by generating and verifying multiple solution attempts or reasoning chains at test time.

1.
OpenAI o1 scales test-time compute by running more MCTS-like search during inference - models that spend 1000x more compute at test time outperform base GPT-4 on competition maths even with the same parameters.
2.
Google DeepMind AlphaProof and AlphaGeometry 2 use test-time compute scaling to solve IMO 2024 problems - generating 100M+ candidate proof steps and verifying them with a formal mathematics verifier.
3.
Snell et al. (2024) 'Scaling LLM Test-Time Compute Optimally' shows that for hard problems, scaling test-time compute is more efficient than scaling model size - informing the design of o1/o3 reasoning models.

Loading…