Glossary term
Glossary term
Training and Fine-Tuning
A semi-supervised learning approach particularly useful when all of the following conditions are true:
The ratio of unlabeled examples to labeled examples in the dataset is high.
This is a classification problem (binary or multi-class).
The dataset contains two different sets of predictive features that are independent of each other and complementary.
Co-training essentially amplifies independent signals into a stronger signal. For example, consider a classification model that categorizes individual used cars as either Good or Bad. One set of predictive features might focus on aggregate characteristics such as the year, make, and model of the car; another set of predictive features might focus on the previous owner's driving record and the car's maintenance history.
The seminal paper on co-training is Combining Labeled and Unlabeled Data with Co-Training by Blum and Mitchell.
Created for this library
A document classification team uses co-training with two views of each document, one based on text and one based on metadata, to label data efficiently.
A medical NLP team uses co-training across notes and structured codes to expand a small seed of labeled examples.
A web classification team uses co-training across page content and inbound links to grow its labeled training set with high-confidence predictions.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License