Glossary term
Glossary term
Training and Fine-Tuning
A sophisticated gradient descent algorithm that rescales the gradients of each parameter, effectively giving each parameter an independent learning rate. For a full explanation, see Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.
Created for this library
A recommendation team trains a sparse matrix factorization model with AdaGrad so rarely seen item features still receive meaningful gradient updates.
A natural-language search team uses AdaGrad on a wide bag-of-words model to handle the long tail of vocabulary without manually tuning a learning rate per feature.
A click-through prediction team at an ad network selects AdaGrad to give per-parameter learning rates across millions of one-hot encoded features.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License