AdaGrad

A sophisticated gradient descent algorithm that rescales the gradients of each parameter, effectively giving each parameter an independent learning rate. For a full explanation, see Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

Real-world uses

Created for this library

1.
A recommendation team trains a sparse matrix factorization model with AdaGrad so rarely seen item features still receive meaningful gradient updates.
2.
A natural-language search team uses AdaGrad on a wide bag-of-words model to handle the long tail of vocabulary without manually tuning a learning rate per feature.
3.
A click-through prediction team at an ad network selects AdaGrad to give per-parameter learning rates across millions of one-hot encoded features.

Back to glossary

AdaGrad

Real-world uses

Related terms

Loading…

AdaGrad

Real-world uses

Related terms