Distillation

The process of reducing the size of one model (known as the teacher) into a smaller model (known as the student) that emulates the original model's predictions as faithfully as possible. Distillation is useful because the smaller model has two key benefits over the larger model (the teacher):

Faster inference time

Reduced memory and energy usage

However, the student's predictions are typically not as good as the teacher's predictions.

Distillation trains the student model to minimize a loss function based on the difference between the outputs of the predictions of the student and teacher models.

Compare and contrast distillation with the following terms:

fine-tuning

prompt-based learning

See LLMs: Fine-tuning, distillation, and prompt engineering in Machine Learning Crash Course for more information.

Real-world uses

Created for this library

1.
An LLM team distills a large teacher model into a smaller student model so the production system can hit a strict latency budget.
2.
A vision team distills a heavy detector into a smaller student model for on-device deployment without losing too much accuracy.
3.
An NLP team distills a teacher LLM into a 7-billion-parameter student model that is cheap enough to serve at the company's request volume.

Back to glossary

Faster inference time

Reduced memory and energy usage

However, the student's predictions are typically not as good as the teacher's predictions.

Distillation trains the student model to minimize a loss function based on the difference between the outputs of the predictions of the student and teacher models.

Compare and contrast distillation with the following terms:

fine-tuning

prompt-based learning

See LLMs: Fine-tuning, distillation, and prompt engineering in Machine Learning Crash Course for more information.

Real-world uses

Created for this library

1.
An LLM team distills a large teacher model into a smaller student model so the production system can hit a strict latency budget.
2.
A vision team distills a heavy detector into a smaller student model for on-device deployment without losing too much accuracy.
3.
An NLP team distills a teacher LLM into a 7-billion-parameter student model that is cheap enough to serve at the company's request volume.

Back to glossary

Distillation

Real-world uses

Related terms

Loading…

Distillation

Real-world uses

Related terms