Glossary term
Glossary term
Governance and Compliance
Processing data before it's used to train a model. Preprocessing could be as simple as removing words from an English text corpus that don't occur in the English dictionary, or could be as complex as re-expressing data points in a way that eliminates as many attributes that are correlated with sensitive attributes as possible. Preprocessing can help satisfy fairness constraints.
Created for this library
An ML platform team standardizes preprocessing so training and serving use the same transformations.
A retail data team versions preprocessing alongside the model so retraining stays reproducible across vintages.
An NLP team builds a preprocessing pipeline that handles unicode normalization, tokenization, and truncation in a single, testable component.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License