Top-P Sampling

Inference strategy that samples from the smallest set of tokens whose cumulative probability mass exceeds a threshold p.

1.
Anthropic uses top-p=0.999 by default in Claude's API - the setting preserves almost the full vocabulary distribution while preventing extremely rare tokens from being sampled in coherent generation.
2.
OpenAI recommends setting top-p=1 when adjusting temperature (don't modify both simultaneously) - used by developers building creative writing tools where diversity of output is preferred over determinism.
3.
A code-generation application uses top-p=0.95 alongside temperature=0.2 - reducing the probability that the model samples an unlikely but syntactically plausible code token that would introduce a subtle bug.

Loading…