Glossary term
Glossary term
Multimodal AI
An instruction-tuned model that can process input beyond text, such as images, video, and audio.
Created for this library
A SaaS team uses a multimodal instruction-tuned model so its assistant can follow instructions on screenshots and structured text together.
An insurance underwriting team uses a multimodal instruction-tuned model that handles claim photos and text descriptions in a single request.
A retail e-commerce team uses a multimodal instruction-tuned model so the assistant can answer questions about product images and descriptions.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License