Stochastic Gradient Descent (SGD)

A gradient descent algorithm in which the batch size is one. In other words, SGD trains on a single example chosen uniformly at random from a training set.

See Linear regression: Hyperparameters in Machine Learning Crash Course for more information.

Real-world uses

Created for this library

1.
An ML team uses stochastic gradient descent with momentum as the default optimizer for production training pipelines.
2.
A research team uses SGD with a cosine learning rate schedule as a baseline optimizer for ResNet-style training.
3.
An ML platform team uses SGD variants tuned per model family so engineers can focus on data and features.

Back to glossary

Stochastic Gradient Descent (SGD)

Real-world uses

Related terms

Loading…

Stochastic Gradient Descent (SGD)

Real-world uses

Related terms