Glossary term
Glossary term
Multimodal AI
Computer vision technique for extracting text from images or documents.
Amazon Textract uses deep learning OCR to extract text, tables, and form fields from scanned documents - used by mortgage companies to process 10,000+ loan application documents per day with 99.5% field accuracy.
Google Cloud Document AI combines OCR with layout understanding to extract structured data from invoices, receipts, and identity documents - used by Uber Eats to automatically process 50M+ restaurant menu PDFs.
Apple's Live Text feature uses on-device OCR (Vision framework) to detect and copy text in camera viewfinder or photos in real time - used 1.5 billion times per month across iPhone users.