Training Data Poisoning

Manipulation of training or fine-tuning data to influence model behavior, degrade performance, create backdoors, or introduce harmful outputs. Controls include source validation, dataset change control, anomaly detection, separation of duties, provenance tracking, and review of contributed or scraped data.

Examples

1.
Carlini et al. (USENIX 2024) demonstrated that adversaries can poison web-scale datasets like LAION and Common Crawl at low cost.
2.
Microsoft's Tay chatbot (2016) was poisoned by Twitter users within 24 hours into producing racist outputs, an early demonstration of online learning poisoning.
3.
Nightshade and Glaze tools from the University of Chicago let artists poison training data to protect against unauthorized scraping by image models.

Related terms

Back to glossary

Examples

Related terms

Loading…

Examples

Related terms