Glossary term
Glossary term
Multimodal AI
Reducing a matrix (or matrixes) created by an earlier convolutional layer to a smaller matrix. Pooling usually involves taking either the maximum or average value across the pooled area. For example, suppose we have the following 3x3 matrix:
A pooling operation, just like a convolutional operation, divides that matrix into slices and then slides that convolutional operation by strides. For example, suppose the pooling operation divides the convolutional matrix into 2x2 slices with a 1x1 stride. As the following diagram illustrates, four pooling operations take place. Imagine that each pooling operation picks the maximum value of the four in that slice:
Pooling helps enforce translational invariance in the input matrix.
Pooling for vision applications is known more formally as spatial pooling. Time-series applications usually refer to pooling as temporal pooling. Less formally, pooling is often called subsampling or downsampling.
Created for this library
A computer vision team uses max pooling in its CNN to downsample spatial dimensions while keeping the dominant activations.
An NLP team uses mean pooling on token embeddings to produce a single vector per document for downstream classification.
A medical imaging team uses global average pooling at the end of its CNN to produce a fixed-size representation for the classifier head.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License