Glossary term
Glossary term
Multimodal AI
AI task of segmenting audio by speaker identity, answering 'who spoke when'.
Pyannote.audio (open-source) provides state-of-the-art speaker diarisation used by meeting transcription tools - Otter.ai and Fireflies.ai use it to attribute meeting transcript segments to individual speakers.
AWS Transcribe's multi-speaker detection is used by call centre analytics platforms to separate agent and customer speech - enabling per-speaker sentiment analysis and compliance monitoring on 10M+ calls per month.
Google Meet's live transcription uses speaker diarisation combined with user identity to attribute captions to the correct speaker tile - even when multiple speakers overlap, using beamforming audio separation.