Glossary term
Glossary term
Foundations
A floating-point number that tells the gradient descent algorithm how strongly to adjust weights and biases on each iteration. For example, a learning rate of 0.3 would adjust weights and biases three times more powerfully than a learning rate of 0.1.
Learning rate is a key hyperparameter. If you set the learning rate too low, training will take too long. If you set the learning rate too high, gradient descent often has trouble reaching convergence.
Click the icon for a more mathematical explanation.
See Linear regression: Hyperparameters in Machine Learning Crash Course for more information.
Created for this library
An ML team tunes learning rate first when bringing up training because it has the largest impact on convergence.
A research team uses a cyclical learning rate schedule during pretraining to balance fast convergence and final accuracy.
An ML platform team standardizes learning-rate schedules per model family so engineers can focus on data and features.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License