Positional Encoding

Mechanism for injecting sequence position information into token embeddings since Transformers have no built-in order awareness.

A technique to add information about the position of a token in a sequence to the token's embedding. Transformer models use positional encoding to better understand the relationship between different parts of the sequence.

A common implementation of positional encoding uses a sinusoidal function. (Specifically, the frequency and amplitude of the sinusoidal function are determined by the position of the token in the sequence.) This technique enables a Transformer model to learn to attend to different parts of the sequence based on their position.

Examples

1.
The original Transformer used sinusoidal positional encodings - fixed mathematical patterns that allowed the model to distinguish token positions without learned parameters.
2.
Learned absolute positional embeddings (used in BERT and GPT-2) add trainable position vectors to token embeddings - but fail to generalise to sequence lengths longer than those seen during training.
3.
ALiBi (Attention with Linear Biases, Press et al. 2021) adds position-based biases directly to attention scores rather than embeddings - used in Mosaic MPT-7B and BloombergGPT for length extrapolation.

Real-world uses

Created for this library

1.
An NLP team uses sinusoidal positional encoding in its transformer so token order is encoded explicitly in inputs.
2.
A research team experiments with learned positional encodings to see if they capture position better than fixed sinusoidal encodings for long documents.
3.
A code-completion team uses relative positional encodings in its transformer so the model focuses on relative token distances.

Back to glossary

Mechanism for injecting sequence position information into token embeddings since Transformers have no built-in order awareness.

Examples

1.
The original Transformer used sinusoidal positional encodings - fixed mathematical patterns that allowed the model to distinguish token positions without learned parameters.
2.
Learned absolute positional embeddings (used in BERT and GPT-2) add trainable position vectors to token embeddings - but fail to generalise to sequence lengths longer than those seen during training.
3.
ALiBi (Attention with Linear Biases, Press et al. 2021) adds position-based biases directly to attention scores rather than embeddings - used in Mosaic MPT-7B and BloombergGPT for length extrapolation.

Real-world uses

Created for this library

1.
An NLP team uses sinusoidal positional encoding in its transformer so token order is encoded explicitly in inputs.
2.
A research team experiments with learned positional encodings to see if they capture position better than fixed sinusoidal encodings for long documents.
3.
A code-completion team uses relative positional encodings in its transformer so the model focuses on relative token distances.

Back to glossary

Positional Encoding

Examples

Real-world uses

Related terms

Loading…

Positional Encoding

Examples

Real-world uses

Related terms