Sparse Representation

Storing only the position(s) of nonzero elements in a sparse feature.

For example, suppose a categorical feature named species identifies the 36 tree species in a particular forest. Further assume that each example identifies only a single species.

You could use a one-hot vector to represent the tree species in each example. A one-hot vector would contain a single 1 (to represent the particular tree species in that example) and 35 0s (to represent the 35 tree species not in that example). So, the one-hot representation of maple might look something like the following:

Alternatively, sparse representation would simply identify the position of the particular species. If maple is at position 24, then the sparse representation of maple would simply be:

Notice that the sparse representation is much more compact than the one-hot representation.

Note: You shouldn't pass a sparse representation as a direct feature input to a model. Instead, you should convert the sparse representation into a one-hot representation before training on it.

Click the icon for a slightly more complex example.

Click the icon if you are confused.

See Working with categorical data in Machine Learning Crash Course for more information.

Examples

1.
For example, suppose a categorical feature named species identifies the 36 tree species in a particular forest. Further assume that each example identifies only a single species.

Real-world uses

Created for this library

1.
An ML team uses sparse representations like TF-IDF as a baseline before evaluating dense embedding alternatives.
2.
A search team uses sparse representations from BM25 alongside dense embeddings in a hybrid retrieval system.
3.
A research team uses sparse representations to keep memory bounded when working with very large vocabularies.

Back to glossary

Storing only the position(s) of nonzero elements in a sparse feature.

For example, suppose a categorical feature named species identifies the 36 tree species in a particular forest. Further assume that each example identifies only a single species.

Alternatively, sparse representation would simply identify the position of the particular species. If maple is at position 24, then the sparse representation of maple would simply be:

Notice that the sparse representation is much more compact than the one-hot representation.

Note: You shouldn't pass a sparse representation as a direct feature input to a model. Instead, you should convert the sparse representation into a one-hot representation before training on it.

Click the icon for a slightly more complex example.

Click the icon if you are confused.

See Working with categorical data in Machine Learning Crash Course for more information.

Examples

1.
For example, suppose a categorical feature named species identifies the 36 tree species in a particular forest. Further assume that each example identifies only a single species.

Real-world uses

Created for this library

1.
An ML team uses sparse representations like TF-IDF as a baseline before evaluating dense embedding alternatives.
2.
A search team uses sparse representations from BM25 alongside dense embeddings in a hybrid retrieval system.
3.
A research team uses sparse representations to keep memory bounded when working with very large vocabularies.

Back to glossary

Sparse Representation

Examples

Real-world uses

Related terms

Loading…

Sparse Representation

Examples

Real-world uses

Related terms