Glossary term
Glossary term
Architecture
A mechanism used in a neural network that indicates the importance of a particular word or part of a word. Attention compresses the amount of information a model needs to predict the next token/word. A typical attention mechanism might consist of a weighted sum over a set of inputs, where the weight for each input is computed by another part of the neural network.
Refer also to self-attention and multi-head self-attention, which are the building blocks of Transformers.
See LLMs: What's a large language model? in Machine Learning Crash Course for more information about self-attention.
Created for this library
A document understanding startup attributes its accuracy gains on long contracts to attention layers that let the model focus on relevant clauses regardless of position.
A translation vendor reports that switching to attention-based models cut its post-editing effort on enterprise contracts because long sentences were translated more faithfully.
A code completion tool uses attention to weight earlier function signatures in the file when predicting the next token, improving in-context relevance.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License