Motivating the logistic and softmax output layers through a great exercise.
\[
\]
The basic idea is that Bayes's rule can be mutated as follows:
an represent as a function of the dot product of two vectors, for example:
Let
,
which repesents the LED set with only the 1st LED on and we define the vector
,
then can be related to the number of correct digits like so:
[What the hell happend below:]
\[
\]
Thus we see how the probability can be molded into an exponential form, and
how Bayes's rule allows for the form. With more
than two classes, it's easier to express the probability without dividing the
numerator and denomitator by the numerator, so we are left with the softmax form
as show in the question. Thus, it is easy to see how the softmax and the
logistic output units are essentially the same thing: relative probabilities
when expresses as exponentials.
If a random variable has two possible values and you can express the event probabilities as and , then you can arrive at an exponential formulation like so:
Where .