Glossary term
Glossary term
Multimodal AI
In an image classification problem, an algorithm's ability to successfully classify images even when the size of the image changes. For example, the algorithm can still identify a cat whether it consumes 2M pixels or 200K pixels. Note that even the best image classification algorithms still have practical limits on size invariance. For example, an algorithm (or human) is unlikely to correctly classify a cat image consuming only 20 pixels.
See also translational invariance and rotational invariance.
See the Clustering course for more information.
Created for this library
A retail analytics team augments product images with random rescales so its detector achieves size invariance across distances on shelves.
A medical imaging team uses multi-scale features in its model so size invariance is built into the architecture.
An autonomous-driving team augments training images with random crops and rescales so the detector is robust across object sizes in real driving footage.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License