\( \newcommand{\matr}[1] {\mathbf{#1}} \newcommand{\vertbar} {\rule[-1ex]{0.5pt}{2.5ex}} \newcommand{\horzbar} {\rule[.5ex]{2.5ex}{0.5pt}} \)
deepdream of
          a sidewalk

Experiment 1. Summary.

A high-level overview of progress in experiment 1.


Investigate to what extent popular image classification neural networks can “see” related colors.

Define “see”

Good question! For now, this is limited to testing whether a neural network keeps or loses the sufficient information as it passed through the network. Sufficient information to distinguish relative colors.

Our minds create a “color” experience that depends not only on the light from an object, but also light from surrounding the object. Colors can only be experienced when an object is seen as less bright compared to the surroundings are considered related colors.

The importance of related colors becomes apparent when you realize that some related colors cannot be observed as light sources. Why? Light sources appear as the brightest local object, so they always establish themselves as a local maximum brightness. However, there are some color experiences that can only be experienced if the object has a brightness less than the local maximum brightness. A famous example is brown; Fairchild describes in Color Appearance Models how Guinness tried and failed to create a brown neon light. Another example I have noticed is the Fukutoshin subway line in Tokyo: it’s designated color is brown, however, signs for the subway line often appear orange and even sometimes a purple color (I’m wondering if it was a specific choice to make it more purple after failing to make a distinguishable brown light).

Orange vs. brown

Orange and brown are related colors: an area of a screen that appears orange can be made to appear brown by increasing the brightness of the surrounding pixels. The reason I have chosen to focus on orange and brown is that the human visual system considers the distinction important enough to consider them different colors in the sense that they have different labels. Someone who works with colors may have a sufficient vocabulary to identify many such color pairs; however, most people do not have a vocabulary for color that enables them to distinguish such pairs by name. White-gray, green-olive, red-maroon, blue-navy are some other pairs that I think can be switched between by changing only the brightness of the surrounding pixels.


Manually created dataset, from the following setup:

Experiments 1

List of experiments so far.

Experiment 1.1

Here I collected the data and confirmed that humans (or me, at least) require relative brightness in order to distinguish orange and brown.

The data looks like:

Answer Circle Color Background Color
0 [0.81, 0.75, 0.28] [0.25, 0.25, 0.25]
3 [0.12, 0.15, 0.34] [0.50, 0.50, 0.50]
1 [0.54, 0.23, 0.10] [0.91, 0.91, 0.91]
2 [0.84, 0.19, 0.29] [0.85, 0.85, 0.85]
1 [0.87, 0.34, 0.20] [0.94, 0.94, 0.94]
0 [0.43, 0.43, 0.72] [0.31, 0.31, 0.31]

Where elements in the “Answer” represent my choice as follows:

Answer Meaning
0 orange
1 brown
2 both
3 neither

Experiment 1.2

I noticed that the Imagenet dataset has 1 pair of classes (lemon, orange) whose classification, if done by humans, would probably rely very heavily on the color. I downloaded a picture of a lemon, photoshopped it to look orange, and asked a pre-trained network to classify it.

Lemon to orange

Original lemon image:

Edited with orange color:


The pre-trained network rarely (~10%) misclassified the lemon as an orange. This suggests that the network doesn’t rely too heavily on color data for this example.

It’s interesting that the neural net classifier doesn’t rely heavily on color in a situation where many humans would rely on heavily on color.

Experiment 1.3

Use data from experiment 1.1 to train a neural network.

In this experiment, a pre-trained ResNet is used, with only the last layer being trainable.

1.3.1 and 1.3.2

Two sub-experiments were done.

  • 1.3.1, allow all parameters to be trained (this was by mistake)
  • 1.3.2, fix all parameters except the last layer


The network had a high-ish (~85%) classification rate.

Experiment 1.4

In this experiment, we repeat experiment 1.3, but chop off the end of the network (classification side) to expose 512x7x7 activations. Experiment 1.3 used the standard 512x1x1 activations outputted from ResNet.


The network had a low (~70%) classification rate. This is quite poor given that there are only 3 classes from which to classify. Furthermore, one of the classes (neither) is typically quite unrelated to orange or brown, and it’s expected that identifying “neither” to be easy. If that’s not enough, the “neither” class is the most numerous.

If we ignore the misassignments as neither, then we have about a 75% accuracy for the binary classification task of deciding between orange and brown. Quite poor.