Glossary term
Glossary term
Training and Fine-Tuning
A specific implementation of the gradient descent algorithm. Popular optimizers include:
AdaGrad, which stands for ADAptive GRADient descent.
Adam, which stands for ADAptive with Momentum.
Created for this library
An ML team uses Adam as the default optimizer for production training pipelines because it is robust across hyperparameter ranges.
A research team experiments with Adafactor as the optimizer for large language model training to save optimizer-state memory.
An ML platform team standardizes optimizer choice per model family so engineers can focus on data and features.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License