Concept
This page is a scrapbook of ideas for the project.
Color and neural networks
The most general question I want to ask is:
What can be said about the relationship between neural networks for vision and human color perception?
In many ways, this question has issues. The topic of human color perception is the subject of deep unanswered questions. Color is a property of our mind, an appearence in consciousness and I am currently of the belief that we won’t get a satisfying answer to the question what is color? without knowing something deep about consciounsess. So, it is tenuous to then take the idea of color perception and ask how it relates to neural networks. It might be a useless question for other reasons too: these ideas might be so different that their comparison is useless. Sure, the idea of pain is as nebulous a topic as color, but I’m reasonably confident that the question “what can be said about the relationship between pain and NAND logic gates?” is pretty useless.
At this point I can start to mount a defence of the question. To me, it doesn’t seem obvious that color and neural networks are so vastly different as to be unrelatable. If the answer to the question is “No, nothing much can be said about the relationship between neural networks for vision and color”, then this conclusion would be an interesting and valuable insight in itself.
We can shift the inquiry slightly by avoiding the word color and instead talk about light and reflectance; this switch helps avoid the interesting but nebuluous questions of consciousness. Having said that, it is difficult to explain the ideas without reaching for the word color, and I don’t make an attempt to avoid it.
Project map
The following mind map lays out some of the project ideas.
Paper ideas
A brainstorming activity. If the following were paper titles, would they make sense, would they be interesting and could supporting evidence ever be found?
- CNNs trained on ImageNet are colorblind.
- ResNet trained on ImageNet is colorblind.
- ImageNet classification can be achieved in grayscale.
- CNNs trained on ImageNet develop a 2D colorspace.
- CNNs trained on ImageNet are [not] unaware on related colors.
- CNNs trained on ImageNet are [not] invariant to illumination changes.
These questions can be reworded with different network types, different vision tasks and different datasets.
The dataset that trained human vision
The evolution that gave us human vision can be thought of as having been run on a dataset that is now impossible to recreate. Even to fix a period, say 40-50,000 years ago, and ask what was the distribution of scenes humans lived in, or what was distribution of light they saw–even this question seems impossible to answer. But this is the sort of data that is needed in order to understand the directions human vision was pushed in by evolution. Even if we were to obtain this type of data, we would need a good idea of the state of human vision at that point, as evolution is not producing optimal systems, but reworking existing ones while being ignorant of any global optima. For example, our understanding of the three human cone types cannot ignore the loss of two cone types experienced by mammals during the time of the dinosaurs and the later duplication of the red cone.1
The need for human-like vision
Tasks like object recognition don’t require human-like solutions. Tasks that ask questions about human perception; however, inherently require modeling of the human visual system. Trying to train systems to determine how a human will interpret a scene requires the system to learn some model of the human visual system. For example, it may be useful to ask how a human perceives the distance between two objects in a scene, or what color a human would assign to an object.
For another example, consider an AI system taking text instructions like “Change the warrior’s boots to look more brown.” and editing an image or video to achieve the desired effect. For an AI system to succeed, it seems important that system should be aware that brown is a related color, and as such, human’s will perceive a stronger “brown” sensation not only based on the light seeming to come from the boots, but also from the light seeming to come from the rest of the scene. An alternative for such a system would be to make the system be highly aware of light and material physics and to instead direct the system with instructions like “Change the warrior’s boots to be made of worn undyed leather”. The system would then somehow use it’s knowledge of leather reflectance properties to create the desired effect.”
The desire for interpretable and explainable models also motivates an inquiry into how human color perception might relate to machine vision.