Glossary term
Glossary term
Multimodal AI
AI technology that converts spoken audio to text.
OpenAI Whisper (large-v3) achieves near-human word-error rates on English audio - used by Otter.ai to transcribe 1 million+ meetings per day with speaker diarisation and timestamp alignment.
Google's Universal Speech Model (USM) powers Google Meet's live captioning and transcription across 125 languages - processing 300 million minutes of calls monthly with <5% word-error rate.
AWS Transcribe Medical uses ASR fine-tuned on clinical vocabulary to transcribe doctor-patient conversations - used by 10,000+ healthcare organisations for ambient clinical documentation, reducing note-taking time by 45%.