Information Gain

In decision forests, the difference between a node's entropy and the weighted (by number of examples) sum of the entropy of its children nodes. A node's entropy is the entropy of the examples in that node.

For example, consider the following entropy values:

entropy of parent node = 0.6

entropy of one child node with 16 relevant examples = 0.2

entropy of another child node with 24 relevant examples = 0.1

So 40% of the examples are in one child node and 60% are in the other child node. Therefore:

weighted entropy sum of child nodes = (0.4 * 0.2) + (0.6 * 0.1) = 0.14

So, the information gain is:

information gain = entropy of parent node - weighted entropy sum of child nodes

information gain = 0.6 - 0.14 = 0.46

Most splitters seek to create conditions that maximize information gain.

Examples

1.
For example, consider the following entropy values:
2.
entropy of parent node = 0.6
3.
entropy of one child node with 16 relevant examples = 0.2

Real-world uses

Created for this library

1.
A decision-tree learner uses information gain to choose splits in a feature-rich tabular dataset for credit risk.
2.
A churn modeling team uses information gain to rank candidate features before promoting any of them to the production model.
3.
A research team uses information gain as a simple but effective feature selection signal when training tree-based baselines.

Back to glossary

For example, consider the following entropy values:

entropy of parent node = 0.6

entropy of one child node with 16 relevant examples = 0.2

entropy of another child node with 24 relevant examples = 0.1

So 40% of the examples are in one child node and 60% are in the other child node. Therefore:

weighted entropy sum of child nodes = (0.4 * 0.2) + (0.6 * 0.1) = 0.14

So, the information gain is:

information gain = entropy of parent node - weighted entropy sum of child nodes

information gain = 0.6 - 0.14 = 0.46

Most splitters seek to create conditions that maximize information gain.

Examples

1.
For example, consider the following entropy values:
2.
entropy of parent node = 0.6
3.
entropy of one child node with 16 relevant examples = 0.2

Real-world uses

Created for this library

1.
A decision-tree learner uses information gain to choose splits in a feature-rich tabular dataset for credit risk.
2.
A churn modeling team uses information gain to rank candidate features before promoting any of them to the production model.
3.
A research team uses information gain as a simple but effective feature selection signal when training tree-based baselines.

Back to glossary

Information Gain

Examples

Real-world uses

Related terms

Loading…

Information Gain

Examples

Real-world uses

Related terms