Calibration

Alignment between a model's expressed confidence and its actual accuracy; a well-calibrated model's 70% confidence predictions are correct 70% of the time.

Examples

1.
OpenAI measures calibration of GPT-4 on MMLU by comparing stated probabilities to empirical accuracy - GPT-4 shows strong calibration (ECE <3%) while smaller models show significant overconfidence.
2.
Meta's LIMA paper shows that alignment-tuned models are less well-calibrated than base models - RLHF training improves helpfulness but increases overconfidence, motivating calibration-aware fine-tuning.
3.
Medical AI deployments require calibration certificates - a radiology AI must demonstrate that its 90%-confidence predictions are correct 88-92% of the time before receiving FDA clearance for clinical use.

Related terms

Back to glossary

Examples

Related terms

Loading…

Examples

Related terms