Glossary term
Glossary term
Architecture
A model architecture for text representation. A trained BERT model can act as part of a larger model for text classification or other ML tasks.
BERT has the following characteristics:
Uses the Transformer architecture, and therefore relies on self-attention.
Uses the encoder part of the Transformer. The encoder's job is to produce good text representations, rather than to perform a specific task like classification.
Is bidirectional.
Uses masking for unsupervised training.
BERT's variants include:
ALBERT, which is an acronym for A Light BERT.
See Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing for an overview of BERT.
Created for this library
A legal-tech vendor fine-tunes BERT on millions of contract clauses to power a clause classifier that flags non-standard indemnification language.
A bank's compliance team uses a BERT model to scan customer service transcripts for complaints that require escalation under regulator rules.
An e-commerce search team fine-tunes BERT on query and product pairs to improve semantic match between misspelled queries and catalog items.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License