QLoRA

Quantised LoRA - fine-tuning technique combining 4-bit model quantisation with LoRA adapters to reduce memory requirements.

1.
QLoRA (Dettmers et al. 2023) enables fine-tuning of a 65B model on a single 48GB A100 GPU - used by the research community to fine-tune Llama 2 65B and Llama 3 70B on consumer and academic hardware.
2.
Guanaco (QLoRA paper companion model) achieved GPT-4-comparable chat quality when fine-tuned from Llama 65B using QLoRA on a single GPU in 24 hours - validating accessibility of fine-tuning at large scale.
3.
A legal-tech startup uses QLoRA to fine-tune Mistral 7B on 5,000 contract examples on a single RTX 4090 GPU - creating a specialised contract-review model at under $50 compute cost.

Loading…