Glossary term
Glossary term
Foundations
Very overloaded term whose meaning varies across different mathematical and scientific fields. Within machine learning, a vector has two properties:
Data type: Vectors in machine learning usually hold floating-point numbers.
Number of elements: This is the vector's length or its dimension.
For example, consider a feature vector that holds eight floating-point numbers. This feature vector has a length or dimension of eight. Note that machine learning vectors often have a huge number of dimensions.
You can represent many different kinds of information as a vector. For example:
Any position on the surface of Earth can be represented as a 2-dimensional vector, where one dimension is the latitude and the other is the longitude.
The current prices of each of 500 stocks can be represented as a 500-dimensional vector.
A probability distribution over a finite number of classes can be represented as a vector. For example, a multiclass classification system that predicts one of three output colors (red, green, or yellow) could output the vector (0.3, 0.2, 0.5) to mean P[red]=0.3, P[green]=0.2, P[yellow]=0.5.
Vectors can be concatenated; therefore, a variety of different media can be represented as a single vector. Some models operate directly on the concatenation of many one-hot encodings.
Specialized processors such as TPUs are optimized to perform mathematical operations on vectors.
For example, consider a feature vector that holds eight floating-point numbers. This feature vector has a length or dimension of eight. Note that machine learning vectors often have a huge number of dimensions.
You can represent many different kinds of information as a vector. For example:
Any position on the surface of Earth can be represented as a 2-dimensional vector, where one dimension is the latitude and the other is the longitude.
Created for this library
A search team stores a vector per document in its vector index so semantic retrieval can find relevant passages by similarity.
A retail recommender team stores a vector per product so the candidate generator can retrieve similar items efficiently.
A help-desk team stores a vector per past ticket so new tickets can be matched to similar prior issues for resolution suggestions.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License