Glossary term
Glossary term
Evaluation and Benchmarks
A dataset that contains a set of sentence beginnings that might contain toxic content. Use this dataset to evaluate an LLM's ability to generate non-toxic text to complete the sentence. Typically, you use the Perspective API to determine how well the LLM performed at this task.
See RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models for details.
Created for this library
An LLM safety team uses RealToxicityPrompts to evaluate how often its model generates toxic content under prompts known to elicit such content.
A research lab reports RealToxicityPrompts results in its model card to communicate safety performance to downstream users.
A model release team gates promotions on RealToxicityPrompts results to avoid regressing on toxic content generation.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License