Glossary term
Glossary term
Agentic Systems
The process of reducing the size of one model (known as the teacher) into a smaller model (known as the student) that emulates the original model's predictions as faithfully as possible. Distillation is useful because the smaller model has two key benefits over the larger model (the teacher):
Faster inference time
Reduced memory and energy usage
However, the student's predictions are typically not as good as the teacher's predictions.
Distillation trains the student model to minimize a loss function based on the difference between the outputs of the predictions of the student and teacher models.
Compare and contrast distillation with the following terms:
See LLMs: Fine-tuning, distillation, and prompt engineering in Machine Learning Crash Course for more information.
Created for this library
An LLM team distills a large teacher model into a smaller student model so the production system can hit a strict latency budget.
A vision team distills a heavy detector into a smaller student model for on-device deployment without losing too much accuracy.
An NLP team distills a teacher LLM into a 7-billion-parameter student model that is cheap enough to serve at the company's request volume.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License