Self-Training

A variant of self-supervised learning that is particularly useful when all of the following conditions are true:

The ratio of unlabeled examples to labeled examples in the dataset is high.

Self-training works by iterating over the following two steps until the model stops improving:

Use supervised machine learning to train a model on the labeled examples.

Use the model created in Step 1 to generate predictions (labels) on the unlabeled examples, moving those in which there is high confidence into the labeled examples with the predicted label.

Notice that each iteration of Step 2 adds more labeled examples for Step 1 to train on.

Real-world uses

Created for this library

1.
A document classification team uses self-training to label unlabeled tickets with the model and add high-confidence examples back to training.
2.
A medical NLP team uses self-training to grow a labeled set by adding the model's confident predictions on unlabeled clinical notes.
3.
A research team uses self-training to expand training data when human labels are scarce but unlabeled data is abundant.

Back to glossary

A variant of self-supervised learning that is particularly useful when all of the following conditions are true:

The ratio of unlabeled examples to labeled examples in the dataset is high.

This is a classification problem.

Self-training works by iterating over the following two steps until the model stops improving:

Use supervised machine learning to train a model on the labeled examples.

Use the model created in Step 1 to generate predictions (labels) on the unlabeled examples, moving those in which there is high confidence into the labeled examples with the predicted label.

Notice that each iteration of Step 2 adds more labeled examples for Step 1 to train on.

Real-world uses

Created for this library

1.
A document classification team uses self-training to label unlabeled tickets with the model and add high-confidence examples back to training.
2.
A medical NLP team uses self-training to grow a labeled set by adding the model's confident predictions on unlabeled clinical notes.
3.
A research team uses self-training to expand training data when human labels are scarce but unlabeled data is abundant.

Back to glossary

Self-Training

Real-world uses

Related terms

Loading…

Self-Training

Real-world uses

Related terms