Glossary term
Glossary term
Multimodal AI
AI technology that synthesises natural-sounding speech from text input.
ElevenLabs' TTS API is used by audiobook publishers to convert manuscript text to narration in any of 29 languages with emotion control - reducing audiobook production cost from $5,000 to $200 per hour of content.
Microsoft Azure Neural TTS is used by e-learning platforms to voice thousands of hours of training content in 140 languages - enabling localisation without hiring voice actors for every language.
OpenAI TTS API (tts-1-hd) is used by Duolingo to generate natural-sounding pronunciation examples for 40 languages - replacing expensive studio recording sessions with dynamically generated audio.