Glossary term
Glossary term
Safety and Alignment
Attack on AI systems by injecting malicious examples into training data to cause targeted misclassification or backdoor behaviour.
Carlini et al. (2021) demonstrated data poisoning of ImageNet by injecting 100 images into the training set - causing a specific target image to be consistently misclassified by the trained model.
Supply-chain data poisoning was demonstrated against GitHub Copilot's training corpus by Schuster et al. - injecting vulnerable code patterns into public repositories causes Copilot to suggest insecure code for targeted functions.
NIST AI RMF includes data poisoning in its threat taxonomy - financial institutions conduct training-data provenance audits to verify that third-party training datasets have not been adversarially modified.