Flamingo

VLM from DeepMind using gated cross-attention layers interleaved with a frozen LLM to enable few-shot visual question answering from interleaved image-text sequences.

Examples

1.
Flamingo (Alayrac et al. DeepMind 2022) achieves few-shot learning on 16 visual question answering and captioning benchmarks by conditioning a frozen Chinchilla LLM on visual features via cross-attention, without updating the LLM weights.
2.
Flamingo's architectural pattern of freezing the LLM and connecting vision via cross-attention layers directly inspired OpenFlamingo, IDEFICS, and other open-source VLMs, establishing the standard approach for efficient VLM training.
3.
Flamingo demonstrates that 32-shot visual in-context learning outperforms prior zero-shot-specialist models on VQAv2, showing that general-purpose few-shot VLMs can match task-specific models without retraining.

Related terms

Back to glossary

Examples

Related terms

Loading…

Examples

Related terms