Glossary term
Glossary term
Multimodal AI
AI capability to parse, describe, and reason about temporal events in video content.
Google Video Intelligence API is used by broadcasters to auto-generate time-coded transcripts and scene descriptions for news archives - processing 10,000+ hours of video per day for content discovery and compliance.
Gemini 1.5 Pro is used by a sports analytics company to analyse broadcast footage - the model watches a 90-minute football match, identifies key events, player actions, and tactical patterns from the video alone.
VideoLLaMA 2 (DAMO Academy) is used for video-based customer feedback analysis - retail stores submit checkout camera recordings and the model identifies friction points in the customer journey without human review.