Glossary term
Glossary term
Multimodal AI
AI applied to understanding, parsing, and extracting information from documents including PDFs, scans, and forms.
Microsoft Azure Document Intelligence is used by healthcare providers to extract structured data from clinical notes, insurance forms, and lab reports - reducing manual data entry time by 80%.
LlamaIndex's document loaders use layout-aware PDF parsing to preserve table structure, heading hierarchy, and figure placement when chunking documents for RAG pipelines.
DocVQA benchmark (2021) standardised evaluation of document understanding models - leading to models like LayoutLMv3 (Microsoft) that understand both text and visual layout of scanned business documents.