The Kullback-Leibler divergence of a true distribution
Hence, if we can show that the KL divergence is always greater than or equal to zero, we will have also shown that the cross entropy is less than or equal to the Shannon entropy of the true distribution, i.e.
Jensen’s inequality states that for a random variable
Let
which is always positive. Now let
Remember that
Substituting
which simplifies to
So we evaluate the expected value and obtain,
Now recall the logarithm property that
We recognize that the left-hand side as the KL divergence. Hence the KL divergence is never less than zero. Since the KL divergence is the difference between the cross-entropy and the Shannon entropy, we conclude that cross-entropy can never be less than the Shannon entropy.