Glossary term
Glossary term
Foundations
An n-gram which may omit (or "skip") words from the original context, meaning the N words might not have been originally adjacent. More precisely, a "k-skip-n-gram" is an n-gram for which up to k words may have been skipped.
For example, "the quick brown fox" has the following possible 2-grams:
"the quick"
"quick brown"
"brown fox"
A "1-skip-2-gram" is a pair of words that have at most 1 word between them. Therefore, "the quick brown fox" has the following 1-skip 2-grams:
"the brown"
"quick fox"
In addition, all the 2-grams are also 1-skip-2-grams, since fewer than one word may be skipped.
Skip-grams are useful for understanding more of a word's surrounding context. In the example, "fox" was directly associated with "quick" in the set of 1-skip-2-grams, but not in the set of 2-grams.
Skip-grams help train word embedding models.
For example, "the quick brown fox" has the following possible 2-grams:
"the quick"
"quick brown"
Created for this library
An NLP team uses skip-gram word embeddings as input features for a downstream classifier on customer feedback.
A search-quality team uses skip-gram embeddings as one signal for query understanding alongside more recent encoder models.
A research team uses skip-gram embeddings as a baseline for word-level semantic similarity tasks.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License