Glossary term
Glossary term
Safety and Alignment
A model whose "reasoning" is impossible or difficult for humans to understand. That is, although humans can see how prompts affect responses, humans can't determine exactly how a black box model determines the response. In other words, a black box model is lacking interpretability.
Most deep models and large language models are black boxes.
Created for this library
A bank's model risk team treats deep neural networks as black-box models and requires shadow surrogate models for explanation to regulators.
An insurance underwriting team labels its gradient-boosted model a black-box model and adds SHAP-based explanations for adjusters reviewing edge cases.
A healthcare provider's governance group restricts black-box models in clinical decision support until interpretability tooling matches the use case's risk level.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License