Glossary term
Glossary term
Foundations
A synthetic feature formed by "crossing" categorical or bucketed features.
For example, consider a "mood forecasting" model that represents temperature in one of the following four buckets:
freezing
chilly
temperate
warm
And represents wind speed in one of the following three buckets:
still
light
windy
Without feature crosses, the linear model trains independently on each of the preceding seven various buckets. So, the model trains on, for example, freezing independently of the training on, for example, windy.
Alternatively, you could create a feature cross of temperature and wind speed. This synthetic feature would have the following 12 possible values:
freezing-still
freezing-light
freezing-windy
chilly-still
chilly-light
chilly-windy
temperate-still
temperate-light
temperate-windy
warm-still
warm-light
warm-windy
Thanks to feature crosses, the model can learn mood differences between a freezing-windy day and a freezing-still day.
If you create a synthetic feature from two features that each have a lot of different buckets, the resulting feature cross will have a huge number of possible combinations. For example, if one feature has 1,000 buckets and the other feature has 2,000 buckets, the resulting feature cross has 2,000,000 buckets.
Formally, a cross is a Cartesian product.
Feature crosses are mostly used with linear models and are rarely used with neural networks.
See Categorical data: Feature crosses in Machine Learning Crash Course for more information.
For example, consider a "mood forecasting" model that represents temperature in one of the following four buckets:
freezing
chilly
Created for this library
A retail demand team adds a feature cross of region and product category to capture local taste preferences in its forecasting model.
An ad-tech team uses feature crosses between device type and ad placement to capture interaction effects in its click model.
A real-estate pricing team adds a feature cross of neighborhood and square footage to capture how location moderates the value of size.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License