Glossary term
Glossary term
Foundations
In information theory, a description of how unpredictable a probability distribution is. Alternatively, entropy is also defined as how much information each example contains. A distribution has the highest possible entropy when all values of a random variable are equally likely.
The entropy of a set with two possible values "0" and "1" (for example, the labels in a binary classification problem) has the following formula:
H = -p log p - q log q = -p log p - (1-p) * log (1-p)
where:
H is the entropy.
p is the fraction of "1" examples.
q is the fraction of "0" examples. Note that q = (1 - p)
log is generally log2. In this case, the entropy unit is a bit.
For example, suppose the following:
100 examples contain the value "1"
300 examples contain the value "0"
Therefore, the entropy value is:
p = 0.25
q = 0.75
H = (-0.25)log2(0.25) - (0.75)log2(0.75) = 0.81 bits per example
A set that is perfectly balanced (for example, 200 "0"s and 200 "1"s) would have an entropy of 1.0 bit per example. As a set becomes more imbalanced, its entropy moves towards 0.0.
In decision trees, entropy helps formulate information gain to help the splitter select the conditions during the growth of a classification decision tree.
Compare entropy with:
cross-entropy loss function
Entropy is often called Shannon's entropy.
See Exact splitter for binary classification with numerical features in the Decision Forests course for more information.
For example, suppose the following:
examples contain the value "1"
examples contain the value "0"
Created for this library
A decision-tree learner uses entropy as the splitting criterion in a classification tree for an interpretable credit baseline.
An NLP team monitors the entropy of LLM next-token predictions to detect when the model is highly uncertain and may need to defer to retrieval.
A clustering team uses entropy of cluster assignments as a quality signal when comparing different choices of k in a customer segmentation.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License