Glossary term
Glossary term
Training and Fine-Tuning
Initial large-scale training of a model on broad data (typically next-token prediction or masked-language modelling) before any task-specific fine-tuning.
The initial training of a model on a large dataset. Some pre-trained models are clumsy giants and must typically be refined through additional training. For example, ML experts might pre-train a large language model on a vast text dataset, such as all the English pages in Wikipedia. Following pre-training, the resulting model might be further refined through any of the following techniques:
GPT-4's pre-training on 13 trillion tokens from web crawls, books, and code took approximately $100M in compute - establishing the broad world knowledge that downstream fine-tuning and prompting build on.
Meta pre-trained Llama 3.1 405B on 15 trillion tokens of curated multilingual text and code - the pre-training run used 16,384 H100 GPUs for 77 days, producing the open-weight model released publicly.
MedPaLM 2 (Google) is pre-trained on a broad corpus then specialised with medical text fine-tuning - pre-training provides general language understanding while subsequent steps add clinical knowledge.
Created for this library
An LLM team pre-trains a foundation model on a broad web corpus before fine-tuning it for specific use cases.
A medical NLP team pre-trains a transformer on a clinical notes corpus before fine-tuning task-specific heads for billing codes.
A research lab pre-trains a base model on a curated multilingual corpus before downstream alignment training.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License