Decoder-Only Model

Transformer architecture using only the decoder stack with causal (left-to-right) self-attention, used for generative tasks.

1.
GPT-4, Claude, Llama 3, Mistral, and Gemini are all decoder-only models - trained to predict the next token autoregressively, which makes them naturally suited for text generation and conversation.
2.
GitHub Copilot uses a decoder-only Code model fine-tuned from GPT - the causal architecture allows it to predict the next token in code completions given the preceding file context.
3.
Decoder-only models dominate the LLM landscape after 2022 because autoregressive training scales more predictably than masked-language-model objectives - Chinchilla scaling laws apply specifically to this architecture.

Loading…