CLIP Score

Metric measuring semantic alignment between a generated image and its text prompt using CLIP embeddings.

1.
DALL-E 3 evaluation reports CLIP scores alongside human preference ratings to measure prompt adherence - a high CLIP score indicates the generated image visually matches the text description.
2.
Text-to-image model comparison platforms (e.g., Artifisial) use CLIP score to automatically rank model outputs for prompt adherence at scale without requiring human raters for every image.
3.
PickScore (Kirstain et al. 2023) fine-tunes CLIP on 584k human preference labels to create a CLIP-Score variant that better predicts human preferences - used in diffusion model RLHF pipelines as a reward model.

Loading…