The derivative of the cost function is needed for back-propagation.
When the output layer uses a MLE/cross-entropy cost and softmax activation, the derivative of the cost function combined with the activation function simplifies to a very simple expression. This note covers this computation.
Consider the last layer of a neural network shown below:

For the given training example
While only the probability for the correct category is used to calculate
the loss for the example, the other outputs,
The partial derivatives will be:
So if we have output probabilities [0.1, 0.3, 0.6], then the derivative of the loss with respect to last layer's output before applying the softmax activation will be [0.1, 0.3, -0.4].
Derivations
Formulating the derivative of loss wrt different variables is simplified if the log and exponential are canceled at the beginning:
From here, the partial derivates are obvious. For example:
The below image is part of a separate card, but I'm including it here, as it is quite relevant.
Old image: