This page is a scrapbook of ideas for the project.

Color and neural networks

The most general question I want to ask is:

What can be said about the relationship between neural networks for vision and human color perception?

In many ways, this question has issues. The topic of human color perception is the subject of deep unanswered questions. Color is a property of our mind, an appearence in consciousness and I am currently of the belief that we won't get a satisfying answer to the question what is color? without knowing something deep about consciounsess. So, it is tenuous to then take the idea of color perception and ask how it relates to neural networks. It might be a useless question for other reasons too: these ideas might be so different that their comparison is useless. Sure, the idea of pain is as nebulous a topic as color, but I'm reasonably confident that the question "what can be said about the relationship between pain and NAND logic gates?" is pretty useless.

At this point I can start to mount a defence of the question. To me, it doesn't seem obvious that color and neural networks are so vastly different as to be unrelatable. If the answer to the question is "No, nothing much can be said about the relationship between neural networks for vision and color", then this conclusion would be an interesting and valuable insight in itself.

We can shift the inquiry slightly by avoiding the word color and instead talk about light and reflectance; this switch helps avoid the interesting but nebuluous questions of consciousness. Having said that, it is difficult to explain the ideas without reaching for the word color, and I don't make an attempt to avoid it.

Project map

The following mind map lays out some of the project ideas.


Paper ideas

A brainstorming activity. If the following were paper titles, would they make sense, would they be interesting and could supporting evidence ever be found?

These questions can be reworded with different network types, different vision tasks and different datasets.

The dataset that trained human vision

The evolution that gave us human vision can be thought of as having been run on a dataset that is now impossible to recreate. Even to fix a period, say 40-50,000 years ago, and ask what was the distribution of scenes humans lived in, or what was distribution of light they saw–even this question seems impossible to answer. But this is the sort of data that is needed in order to understand the directions human vision was pushed in by evolution. Even if we were to obtain this type of data, we would need a good idea of the state of human vision at that point, as evolution is not producing optimal systems, but reworking existing ones while being ignorant of any global optima. For example, our understanding of the three human cone types cannot ignore the loss of two cone types experienced by mammals during the time of the dinosaurs and the later duplication of the red cone. [@badenRetinalBasis2019]

The need for human-like vision

Tasks like object recognition don’t require human-like solutions. Tasks that ask questions about human perception; however, inherently require modeling of the human visual system. Trying to train systems to determine how a human will interpret a scene requires the system to learn some model of the human visual system. For example, it may be useful to ask how a human perceives the distance between two objects in a scene, or what color a human would assign to an object.

For another example, consider an AI system taking text instructions like “Change the warrior’s boots to look more brown.” and editing an image or video to achieve the desired effect. For an AI system to succeed, it seems important that system should be aware that brown is a related color, and as such, human’s will perceive a stronger “brown” sensation not only based on the light seeming to come from the boots, but also from the light seeming to come from the rest of the scene. An alternative for such a system would be to make the system be highly aware of light and material physics and to instead direct the system with instructions like “Change the warrior’s boots to be made of worn undyed leather”. The system would then somehow use it’s knowledge of leather reflectance properties to create the desired effect.”

The desire for interpretable and explainable models also motivates an inquiry into how human color perception might relate to machine vision.

Color science techniques applied to neural networks

Can the approach taken in the paper Could a Neuroscientist Understand a Microprocessor?" be taken in the space of neural networks and color science? The paper explains itself clearly in it's abstract:

There is a popular belief in neuroscience that we are primarily data limited, and that producing large, multimodal, and complex datasets will, with the help of advanced data analysis algorithms, lead to fundamental insights into the way the brain processes information. These datasets do not yet exist, and if they did we would have no way of evaluating whether or not the algorithmically-generated insights were sufficient or even correct. To address this, here we take a classical microprocessor as a model organism, and use our ability to perform arbitrary experiments on it to see if popular data analysis methods from neuroscience can elucidate the way it processes information.

The authors arrived at the conclusion that the analytic approaches in neuroscience fall short of expectations.

The abstract method of this paper is to take known techniques that are used to investigate an unknown system and apply them to a known system. A priori, there is confidence as to what concepts are necessary and sufficient to understand the known system. After applying the techniques to the known system, we can investigate to what extent the techniques reveal these concepts. Difficulty in arriving at the expected necessary concepts is evidence that there the techniques are lacking. Conversely, concepts that are revealed may refute this claim of insufficiency by instead proposing an alternative description.

When considering color and neural networks, is there a mapping between tools and systems that would make this type of investigation interesting? The techniques such as color matching have had success laying the foundations of colorimetry and and later color appearance models. What is it about human vision that made these techniques useful? Can they be used to investigate vision systems, and if not, what is missing that prevents the application of the techniques. Going one step further, what is the minimum properties of a vision system that must exist for the techniques to be applicable.

Coming from the reverse direction, there are many papers that claim to train neural networks to approach the behaviour of human vision networks. Tom Baden's lab and other labs are making progress uncovering the circuits of the retina. If these latter circuits are taken as the "known system", do the neural network approximations hold up on comparison.

3D prior

My general sentiment is that many neural network models used for vision tasks such as classification do not develop a single 3D model representation. There is too much flexibility for the network to develop multiple parallel expedient representations. If networks are designed with constraints that force them to develop a sort of 3D interface, then maybe this is a space in which it is possible to find scene representations, including surface reflectance properties, that can be compared to human experience.

Literature, resources

Image statistics

"Natural Image Statistics and Neural Representation" by Simoncelli and Olshausen is 2001 paper about approaches that try to infer retinal mechanisms by looking at spectral distributions of scenes (think information theory tries to find itself in the retina). [@simoncelliNaturalImageStatistics2001]. It might be worth checking to see if any of these ideas can be reinvestigated. One thought that comes mind: it might be possible to create a sort of unit test for their theories by switching out the retina for neural networks trained on a vision task. Give the flexibility and power of neural networks, there should be no problem with neural networks approaching in some way the optimal encoding pattern that was hypothesized to emerge by these authors. There is an appeal to taking up this work in that its possibly winnable battle in the war against hypotheses that balloon in a discussion.

Retinal networks

Everything by Tom Baden is excellent. His lab is doing work on figuring out the retinal networks of many animals. He did a talk (accessible on Youtube) that summarized a lot of the current work being done in his lab.

More resources

Paper list

People list

Philipp Henzler
3rd year PhD student (Nov, 2021) Working on neural texture representation and 3D reconstruction. Wrote the 2021 paper Generative Modelling of BRDF Textures from Flash Images.