Glossary term
Glossary term
Governance and Compliance
Evidence about where data came from, how it was collected, what rights or restrictions apply, and whether it is appropriate for AI use. Provenance is essential for copyright, consent, licensing, privacy, contractual restrictions, and trust in externally sourced or synthetic datasets.
The Data Provenance Initiative, launched by MIT researchers in 2023, audited over 1,800 popular AI training datasets for licensing and provenance.
C2PA (Coalition for Content Provenance and Authenticity), led by Adobe, Microsoft, and BBC, defines provenance metadata standards for digital content.
The EU AI Act Article 53 requires GPAI providers to publish a sufficiently detailed summary of training data, increasing provenance disclosure expectations.