Glossary term
Glossary term
Safety and Alignment
Degradation phenomenon where models trained on AI-generated data lose diversity and accuracy of the original human data distribution.
Shumailov et al. (2024) 'Model Collapse' Nature paper demonstrated that iterative fine-tuning on AI-generated data causes progressive quality degradation - used to motivate data provenance tracking in training pipelines.
Wikimedia researchers found signs of model collapse in language models trained on post-2022 web data - increasing prevalence of AI-generated text in Common Crawl causes recursive quality degradation.
Stability AI's data curation team applies watermarking and content-classification filters to remove AI-generated images from Stable Diffusion training data - preventing model collapse in subsequent model versions.