Glossary term
Glossary term
Architecture
An extension of self-attention that applies the self-attention mechanism multiple times for each position in the input sequence.
Transformers introduced multi-head self-attention.
Created for this library
A document understanding team uses multi-head self-attention in its long-context contract reader to capture different types of dependencies between clauses.
An NLP research team analyzes the heads of multi-head self-attention to study which linguistic patterns each head learns.
A code-completion vendor uses multi-head self-attention to capture different types of context such as syntax and dataflow.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License