Recall that Shannon entropy
Recall also that conditional entropy
Then the difference
As illuminated by the alternative formulation below, mutual information is symmetric; that is:
I find it surprising and unintuitive that the average amount of information Y encodes about X is exactly the same as the average amount of information that X encodes about Y, but we can see it clearly in the following.
Another way to look at it
A more insightful form of this, suggested by Claude AI, is:
This draws attention to a few facts:
- It directly compares the joint distribution
to the product of the marginal distributions . - It shows that mutual information is symmetric.
- It shows that it’s always non-negative.
- It shows that mutual information is zero iff
and are independent.
Derivation of Claude’s formulation
Let’s start with the original formulation:
Plugging in the integrals above, we obtain
Let’s work on that logarithm in the first integral. We can rewrite it as
Then we can use the logarithm property
We put this back into our integral:
Then we distribute