Constitutional AI

Anthropic's alignment technique using a set of principles to guide model self-critique and revision.

1.
Anthropic's Claude is trained using Constitutional AI - the model first generates a response, then critiques it against 16 constitutional principles (e.g., 'Choose the less harmful response'), then revises it iteratively.
2.
Constitutional AI reduced Anthropic's reliance on human-labelled harmful-content data by 90% - the model generates its own critique-revision pairs, scaling safety training without proportional human annotation cost.
3.
Researchers at DeepMind extended Constitutional AI principles to code generation - using a set of coding principles (correctness, security, efficiency) as a critic to iteratively improve generated code quality.

Loading…