deepdream of
          a sidewalk
Show Question
Math and science::INF ML AI

Entropy of an ensemble

The entropy of an ensemble, X=(x,Ax,Px), is defined to be the average Shannon information content over all outcomes:

H(X)=xAxP(x)log1P(x)

Properties of entropy:

  • H(X)0 with equality iff some outcome has probability 1.
  • Entropy is maximized if the probability of outcomes is uniform.
  • The Entropy is less than the log of the number of outcomes.

The last two points can be expressed as:

H(X)log(|Ax|), with equality iff all outcomes have probability 1|Ax|

Proof on the back side.


Proof that H(X)log(|Ax|):

First note that:

E[1P(X)]=iAxP(i)1P(i)=iAx1=|Ax|

Now then:

H(X)=iAxP(i)log1P(i)=iAxP(i)f(1P(i))

Where f(u)=log(u) is a concave function. Then, Using Jensen's inequality we can say:

H(X)=E[f(1P(X))]f(E[1P(X)])=log(|Ax|)

Perspectives on entropy:

  • If X is a random variable, we can create another random variable, let's call it Y, by applying a fuction to the outcomes, Y=f(X). What if this function depended not on the value of the outcome, but on the probability of the outcome? As there is a function P that maps outcomes to probabilities, we could create a random variable Y=P(X). For Shannon information content and entropy, we create the variable Y=log1P(X). The entropy is the expected value of this random variable.