One-Hot Encoding

Representing categorical data as a vector in which:

One element is set to 1.

All other elements are set to 0.

One-hot encoding is commonly used to represent strings or identifiers that have a finite set of possible values. For example, suppose a certain categorical feature named Scandinavia has five possible values:

"Denmark"

"Sweden"

"Norway"

"Finland"

"Iceland"

One-hot encoding could represent each of the five values as follows:

Thanks to one-hot encoding, a model can learn different connections based on each of the five countries.

Representing a feature as numerical data is an alternative to one-hot encoding. Unfortunately, representing the Scandinavian countries numerically is not a good choice. For example, consider the following numeric representation:

"Denmark" is 0

"Sweden" is 1

"Norway" is 2

"Finland" is 3

"Iceland" is 4

With numeric encoding, a model would interpret the raw numbers mathematically and would try to train on those numbers. However, Iceland isn't actually twice as much (or half as much) of something as Norway, so the model would come to some strange conclusions.

See Categorical data: Vocabulary and one-hot encoding in Machine Learning Crash Course for more information.

Real-world uses

Created for this library

1.
A retail analytics team uses one-hot encoding on payment method as a feature for its cart-abandonment model.
2.
A churn team uses one-hot encoding on plan tier so the production model treats each tier as an independent feature.
3.
A search-quality team uses one-hot encoding on country code as a feature in its ranker alongside continuous engagement signals.

Back to glossary

Representing categorical data as a vector in which:

One element is set to 1.

All other elements are set to 0.

"Denmark"

"Sweden"

"Norway"

"Finland"

"Iceland"

One-hot encoding could represent each of the five values as follows:

Thanks to one-hot encoding, a model can learn different connections based on each of the five countries.

"Denmark" is 0

"Sweden" is 1

"Norway" is 2

"Finland" is 3

"Iceland" is 4

See Categorical data: Vocabulary and one-hot encoding in Machine Learning Crash Course for more information.

Real-world uses

Created for this library

1.
A retail analytics team uses one-hot encoding on payment method as a feature for its cart-abandonment model.
2.
A churn team uses one-hot encoding on plan tier so the production model treats each tier as an independent feature.
3.
A search-quality team uses one-hot encoding on country code as a feature in its ranker alongside continuous engagement signals.

Back to glossary

One-Hot Encoding

Real-world uses

Related terms

Loading…

One-Hot Encoding

Real-world uses

Related terms