Glossary term
Glossary term
Evaluation and Benchmarks
Artificially generated test or training data.
Gretel AI generates synthetic customer-service datasets for training intent-classification models - preserving statistical properties of real data while eliminating PII, enabling GDPR-compliant model training.
Microsoft used synthetic data generation to augment Phi-2's training set - generating 'textbook-quality' synthetic text to improve reasoning without requiring additional human-labelled data.
A healthcare AI company uses Synthea (open-source) to generate synthetic patient records for stress-testing a clinical-documentation agent - creating rare-disease edge cases that don't exist in sufficient quantity in real data.