(No title) (answer)

Motivating the logistic and softmax output layers through a great exercise.

The mess:

\begin{aligned} p (x_{1}, . . ., x_{n}) & = p (x_{1} | x_{2}, . . ., x_{n}) p (x_{2}, . . ., x_{n}) \\ = p (x_{1} | x_{2}, . . ., x_{n}) p (x_{2} | x_{3}, . . ., x_{n}) p (x_{3}, . . ., x_{n}) \\ = p (x_{n}) \prod_{i = 1}^{n - 1} c 1 :: p (x_{i} | x_{i + 1}, . . ., x_{n}) \end{aligned}

The basic idea is that Bayes's rule can be mutated as follows:

\begin{aligned} P (s = 2 | x = x) & = \frac{P (x | s = 2) P (s = 2)}{\sum_{i \in {2, 3}} P (x | s = i) P (s = i)} \\ = \frac{P (x | s = 2) P (s = 2)}{P (x | s = 2) P (s = 2) + P (x | x = 3) P (s = 3)} \\ = \frac{1 + \frac{P (x | s = 3) P (s = 3)}{P (x | s = 2)}}{} \end{aligned}

an represent $P (x = x_{1} | s_{2})$ as a function of the dot product of two vectors, for example:

Let

$x_{1} = [\begin{matrix} 1 & - 1 & - 1 & - 1 & - 1 & - 1 & - 1 \end{matrix}]$ ,

which repesents the LED set with only the 1st LED on and we define the vector

$s_{2} = [\begin{matrix} 1 & - 1 & 1 & 1 & 1 & 1 & - 1 \end{matrix}]$ ,

then $x_{1} s_{2}$ can be related to the number of correct digits like so:

$C = c o r r e c t C o u n t = \frac{x_{1} s_{2} + 7}{2}$

[What the hell happend below:]

Extra close brace or missing open brace

Thus we see how the probability can be molded into an exponential form, and how Bayes's rule allows for the $\frac{1}{1 + e^{f (x)}}$ form. With more than two classes, it's easier to express the probability without dividing the numerator and denomitator by the numerator, so we are left with the softmax form as show in the question. Thus, it is easy to see how the softmax and the logistic output units are essentially the same thing: relative probabilities when expresses as exponentials.

A generality

If a random variable has two possible values and you can express the event probabilities as $α^{- x_{1}}$ and $α^{- x_{2}}$ , then you can arrive at an exponential formulation like so:

\frac{α^{- x_{1}}}{α^{- x_{1}} + α^{- x_{2}}} = \frac{1}{1 + α^{- (x_{2} - x_{1})}} = \frac{1}{1 + e^{- β x}}

Where

x = x_{2} - x_{1}

15.05.2019