Glossary term
Glossary term
Foundations
A gradient descent algorithm in which the batch size is one. In other words, SGD trains on a single example chosen uniformly at random from a training set.
See Linear regression: Hyperparameters in Machine Learning Crash Course for more information.
Created for this library
An ML team uses stochastic gradient descent with momentum as the default optimizer for production training pipelines.
A research team uses SGD with a cosine learning rate schedule as a baseline optimizer for ResNet-style training.
An ML platform team uses SGD variants tuned per model family so engineers can focus on data and features.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License