Mutual InformationΒΆ

Mutual information is a measure of the reduction in uncertainty for one variable given a known value of another variable:

\[I(X,Y)=H(X)-H(X|Y)\]

where \(I(X,Y)\) is the mutual information for \(X\) and \(Y\), \(H(X)\) is the entropy for \(X\), \(H(X|Y)\) is the conditional entropy for \(X\) given \(Y\).

Mutual information is symmetric:

\[I(X,Y)=I(Y,X)\]

Mutual information can be calculated as the KL divergence between the joint distribution and the product of marginal probabilities for each variable:

\[I(X,Y)=KL(P(X,Y)||P(X)\times P(Y))\]

Mutual information and information gain are computed equivalently, and therefore MI is sometimes used as as a synonym for information gain:

\[IG(S,a)=H(S)-H(S|a)\]