Glossary term
Glossary term
Multimodal AI
AI capability to interpret and extract information from charts, graphs, and data visualisations.
ChartQA (Ahmed et al. 2022) benchmarks VLMs on reasoning about bar charts, line charts, and pie charts - GPT-4V achieves 78% accuracy by combining visual parsing with numerical reasoning.
Deplot (Google) converts chart images into data tables that an LLM can then reason over - used in financial research tools to extract trend data from annual report charts without manual digitisation.
Gemini 1.5 Pro is used by investment banks to process earnings presentation PDFs, extracting chart data, slide titles, and key metrics into structured summaries for equity research analysts.