Glossary term
Glossary term
Foundations
In decision forests, the difference between a node's entropy and the weighted (by number of examples) sum of the entropy of its children nodes. A node's entropy is the entropy of the examples in that node.
For example, consider the following entropy values:
entropy of parent node = 0.6
entropy of one child node with 16 relevant examples = 0.2
entropy of another child node with 24 relevant examples = 0.1
So 40% of the examples are in one child node and 60% are in the other child node. Therefore:
weighted entropy sum of child nodes = (0.4 * 0.2) + (0.6 * 0.1) = 0.14
So, the information gain is:
information gain = entropy of parent node - weighted entropy sum of child nodes
information gain = 0.6 - 0.14 = 0.46
Most splitters seek to create conditions that maximize information gain.
For example, consider the following entropy values:
entropy of parent node = 0.6
entropy of one child node with 16 relevant examples = 0.2
Created for this library
A decision-tree learner uses information gain to choose splits in a feature-rich tabular dataset for credit risk.
A churn modeling team uses information gain to rank candidate features before promoting any of them to the production model.
A research team uses information gain as a simple but effective feature selection signal when training tree-based baselines.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License