AWQ - Definition | Agentic AI Library

Activation-aware Weight Quantisation - technique that identifies and preserves salient weights based on activation magnitudes before quantisation.

1.
AWQ (Lin et al. 2023, MIT) achieves better perplexity than GPTQ at the same 4-bit precision by scaling salient channels before quantisation - used by TinyChat and LMDeploy for edge deployment.
2.
Llama 3.1 70B AWQ-quantised models (3.8 bits/weight) fit on a dual-RTX-3090 machine (48GB VRAM) and run at 25 tokens/sec - used by startups needing frontier-scale intelligence on budget hardware.
3.
NVIDIA's AutoAWQ library is integrated into Ollama and llama.cpp - enabling one-command AWQ quantisation of any HuggingFace model, used by enterprise teams to create deployment-ready models from fine-tuned checkpoints.

Loading…