Glossary term
Glossary term
Evaluation and Benchmarks
Abbreviation for Mostly Basic Python Problems.
Created for this library
An LLM evaluation team uses MBPP in its standard benchmark suite to measure basic Python programming ability per model release.
A research lab reports MBPP scores in its model card so downstream users can compare basic coding ability across model versions.
A model release team gates promotions on MBPP scores to avoid regressing on simple coding tasks important to enterprise users.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License