Glossary term
Glossary term
Multimodal AI
OpenAI open-source ASR model trained on 680k hours of multilingual audio, achieving near-human transcription accuracy.
Whisper large-v3 is deployed by Substack to auto-transcribe podcast episodes for paid subscribers - processing 50k+ hours of audio per month and generating searchable transcripts with speaker attribution.
faster-whisper (CTranslate2-based) achieves 4x real-time transcription speed on CPU - used by court reporting services to provide same-day transcripts of hearings without expensive GPU infrastructure.
Whisper is used in on-device voice assistants on MacOS (via whisper.cpp) to transcribe user dictation without sending audio to the cloud - meeting privacy requirements for regulated industry deployments.