Glossary term
Glossary term
Governance and Compliance
The documented path of data from source through transformation, training, tuning, retrieval, deployment, and output. Lineage supports explainability, quality control, privacy, and auditability. Lineage should answer audit questions such as which source influenced a model, which version was deployed, what data was removed, and which outputs may be affected.
OpenLineage, hosted by the LF AI and Data Foundation, is an open standard for data lineage adopted by tools like Apache Airflow and Marquez.
Collibra, Alation, and Atlan are commercial data catalogs widely used by enterprises to track lineage across AI pipelines.
The MLflow Tracking Server and Weights and Biases artifact tracking support lineage from datasets through experiments to deployed models.