Glossary term
Glossary term
Foundations
AI-generated data used to augment or replace real training data for model training, fine-tuning, or evaluation.
Microsoft's Phi-1 through Phi-3 models are trained primarily on GPT-4-generated 'textbook quality' synthetic data - Phi-3-mini achieves GPT-3.5 quality using only 3.8B parameters by training on high-quality synthetic data.
Scale AI's Supervised Fine-Tuning Data product generates synthetic instruction-following examples for enterprise fine-tuning - a healthcare company orders 50,000 synthetic clinical Q&A pairs to fine-tune a medical LLM.
Gretel AI generates synthetic tabular data that preserves statistical distributions of sensitive financial data - used by fintech startups to train fraud-detection models when real transaction data cannot be shared due to PCI-DSS.