Glossary term
Glossary term
Architecture
Positional encoding technique that encodes position as rotation of token embeddings in complex space, enabling length extrapolation.
RoPE is used in Llama 2/3, Mistral, Falcon, and most modern open-source LLMs - enabling better length generalisation than absolute positional embeddings at minimal parameter overhead.
Extended RoPE (LongRoPE, Microsoft 2024) extends Llama's 4k context window to 2M tokens by rescaling RoPE frequencies - used in Phi-3-mini-128k for long-document processing.
YaRN (Yet Another RoPE extensioN) dynamically rescales RoPE for long contexts and is used by Mistral to extend its base 4k context to 32k with minimal perplexity degradation.