Glossary term
Glossary term
Multimodal AI
Computer vision task of assigning a class label or instance identifier to every pixel in an image.
Meta's Segment Anything Model (SAM) is used by Canva's background-removal feature to segment foreground subjects from backgrounds in user-uploaded photos - processing 10 million images per day.
Mask2Former (Meta/CRAI) achieves state-of-the-art panoptic segmentation on COCO and Cityscapes - used in autonomous driving perception stacks to distinguish drivable surfaces, pedestrians, and static obstacles.
Google's DeepLab v3+ is integrated into Google Photos to enable object-aware editing - allowing users to select and edit specific subjects (people, pets, sky) while leaving the rest of the image unchanged.