Math and science::INF ML AI

Entropy of an ensemble

The entropy of an ensemble, $X = (x, A_{x}, P_{x})$ , is defined to be the average Shannon information content over all outcomes:

H (X) = \sum_{x \in A_{x}} P (x) \log \frac{1}{P (x)}

Properties of entropy:

The last two points can be expressed as:

H (X) \leq \log (| A_{x} |), with equality iff all outcomes have probability \frac{1}{| A_{x} |}

Proof on the back side.

First note that:

\begin{aligned} E [\frac{1}{P (X)}] & = \sum_{i \in A_{x}} P (i) \frac{1}{P (i)} \\ = \sum_{i \in A_{x}} 1 \\ = | A_{x} | \end{aligned}

Now then:

\begin{aligned} H (X) & = \sum_{i \in A_{x}} P (i) \log \frac{1}{P (i)} \\ = \sum_{i \in A_{x}} P (i) f (\frac{1}{P (i)}) \end{aligned}

Where $f (u) = \log (u)$ is a concave function. Then, Using Jensen's inequality we can say:

\begin{aligned} H (X) & = E [f (\frac{1}{P (X)})] \\ \leq f (E [\frac{1}{P (X)}]) \\ = \log (| A_{x} |) \end{aligned}

If $X$ is a random variable, we can create another random variable, let's call it $Y$ , by applying a fuction to the outcomes, $Y = f (X)$ . What if this function depended not on the value of the outcome, but on the probability of the outcome? As there is a function P that maps outcomes to probabilities, we could create a random variable $Y = P (X)$ . For Shannon information content and entropy, we create the variable $Y = \log \frac{1}{P (X)}$ . The entropy is the expected value of this random variable.

07.05.2019