Math-pass@k

A metric to determine an LLM's accuracy in solving a math problem within K attempts. For example, math-pass@2 measures an LLM's ability to solve math problems within two attempts. An accuracy of 0.85 on math-pass@2 indicates that an LLM was able to solve math problems 85% of the time within two attempts.

math-pass@k is identical to the pass@k metric, except that the term math-pass@k is specifically used for math evaluation.

Real-world uses

Created for this library

1.
An LLM evaluation team uses math-pass@k in its model release reviews to track how often the model produces a correct math solution among k samples.
2.
A research lab reports math-pass@k scores in its preprint to compare reasoning ability of fine-tuned versions of its model.
3.
A model release team includes math-pass@k as one of several reasoning benchmarks gating production promotion.

Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License

Back to glossary

Real-world uses

Loading…

Real-world uses