Glossary term
Glossary term
Multimodal AI
AI capability to synthesise music, sound effects, or general audio from text or conditioning signals.
MusicGen (Meta) generates music conditioned on text descriptions and optional melody - used by game developers to generate adaptive background music that responds to gameplay state without licensing costs.
Suno AI generates full songs with vocals from a text prompt - used by content creators to produce royalty-free background music for YouTube videos and podcasts without requiring musical skills.
Stability AI's Stable Audio uses latent diffusion for music and sound generation - Adobe Podcast uses similar technology to remove background noise, enhance vocal quality, and generate room tone for audio post-production.