Label Leakage

A model design flaw in which a feature is a proxy for the label. For example, consider a binary classification model that predicts whether or not a prospective customer will purchase a particular product. Suppose that one of the features for the model is a Boolean named SpokeToCustomerAgent. Further suppose that a customer agent is only assigned after the prospective customer has actually purchased the product. During training, the model will quickly learn the association between SpokeToCustomerAgent and the label.

See Monitoring pipelines in Machine Learning Crash Course for more information.

Real-world uses

Created for this library

1.
A risk modeling team flags label leakage when reviewing a feature that uses information available only after the outcome is known.
2.
A retail demand team audits its features for label leakage after every change to the ETL pipeline so production scoring matches training assumptions.
3.
A medical AI team's review checklist includes label leakage as a top risk because clinical data often correlates with outcomes by design.

Back to glossary

Label Leakage

Real-world uses

Related terms

Loading…

Label Leakage

Real-world uses

Related terms