SuperGLUE

An ensemble of datasets for rating an LLM's overall ability to understand and generate text. The ensemble consists of the following datasets:

For details, see SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems.

Created for this library

1.
An LLM evaluation team uses SuperGLUE in its standard NLU benchmark suite for model release reviews.
2.
A research lab reports SuperGLUE scores in model cards so downstream users can compare NLU performance across versions.
3.
A model release team uses SuperGLUE as a baseline NLU benchmark suite gating production promotion.

An ensemble of datasets for rating an LLM's overall ability to understand and generate text. The ensemble consists of the following datasets:

For details, see SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems.

Created for this library

1.
An LLM evaluation team uses SuperGLUE in its standard NLU benchmark suite for model release reviews.
2.
A research lab reports SuperGLUE scores in model cards so downstream users can compare NLU performance across versions.
3.
A model release team uses SuperGLUE as a baseline NLU benchmark suite gating production promotion.

Loading…