Glossary term
Glossary term
Foundations
In unsupervised machine learning, a category of algorithms that perform a preliminary similarity analysis on examples. Sketching algorithms use a locality-sensitive hash function to identify points that are likely to be similar, and then group them into buckets.
Sketching decreases the computation required for similarity calculations on large datasets. Instead of calculating similarity for every single pair of examples in the dataset, we calculate similarity only for each pair of points within each bucket.
Created for this library
A search team uses sketching algorithms to estimate similarity between large document sets without storing them in full.
An ML platform team uses sketching to estimate cardinality of categorical features at scale across many tables.
A research team uses count-min sketching to estimate frequencies of items in a streaming setting before downstream modeling.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License