Glossary term
Glossary term
Training and Fine-Tuning
Training a pre-trained model on labelled instruction-response pairs to teach it to follow instructions.
OpenAI used SFT on 13,000 curated instruction-response pairs to transform GPT-3 into InstructGPT - demonstrating that a small high-quality SFT dataset dramatically improves instruction following.
Llama 3.1 Instruct is created by SFT of the base model on 10M+ human-generated instruction pairs covering coding, reasoning, safety, and multilingual tasks - the instruct model is what developers typically deploy.
Medical SFT: a hospital system fine-tunes Llama 3 8B on 50,000 physician-written clinical documentation examples via SFT - the resulting model generates discharge summaries matching attending-physician quality.