Vanishing Gradient Problem

The tendency for the gradients of early hidden layers of some deep neural networks to become surprisingly flat (low). Increasingly lower gradients result in increasingly smaller changes to the weights on nodes in a deep neural network, leading to little or no learning. Models suffering from the vanishing gradient problem become difficult or impossible to train. Long Short-Term Memory cells address this issue.

Compare to exploding gradient problem.

Real-world uses

Created for this library

1.
A research team uses residual connections in deep networks to mitigate the vanishing gradient problem.
2.
An NLP team uses LSTMs rather than vanilla RNNs to mitigate the vanishing gradient problem on long sequences.
3.
An ML platform team enables careful initialization and normalization in deep models to avoid the vanishing gradient problem in training.

Back to glossary

Vanishing Gradient Problem

Real-world uses

Related terms

Loading…

Vanishing Gradient Problem

Real-world uses

Related terms