Reinforcement Learning from Human Feedback (RLHF)

Using feedback from human raters to improve the quality of a model's responses. For example, an RLHF mechanism can ask users to rate the quality of a model's response with a 👍 or 👎 emoji. The system can then adjust its future responses based on that feedback.

Real-world uses

Created for this library

1.
An LLM team uses RLHF to align a base model with human preferences for helpfulness and safety.
2.
A customer support team uses RLHF on its assistant to fine-tune responses toward styles that human reviewers prefer.
3.
A research lab uses RLHF on its instruction-tuned model to align responses with human judgments at scale.

Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License

Back to glossary

Real-world uses

Loading…

Real-world uses