Glossary term
Glossary term
Training and Fine-Tuning
A training-time optimization that calculates a probability for all the positive labels, using, for example, softmax, but only for a random sample of negative labels. For instance, given an example labeled beagle and dog, candidate sampling computes the predicted probabilities and corresponding loss terms for:
beagle
dog
a random subset of the remaining negative classes (for example, cat, lollipop, fence).
The idea is that the negative classes can learn from less frequent negative reinforcement as long as positive classes always get proper positive reinforcement, and this is indeed observed empirically.
Candidate sampling is more computationally efficient than training algorithms that compute predictions for all negative classes, particularly when the number of negative classes is very large.
Created for this library
A large-scale recommender team uses candidate sampling during training so the softmax over millions of items is approximated by a manageable subset.
A search team uses candidate sampling to train its retrieval model efficiently rather than computing softmax over the full document index.
An ad-tech team uses candidate sampling to make training tractable when the ad inventory has tens of millions of items.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License