Abstract
The contrast sensitivity function (CSF) is a fundamental signature of the visual system and has been measured extensively in numerous species. It is defined by measurements of the visibility threshold for sinusoidal gratings at all spatial frequencies. Here, we measured the CSF in artificial visual systems, namely deep neural networks (DNNs), using the same contrast discrimination paradigm as used in human psychophysical experiments. During training, networks are exclusively exposed to natural images, and the task is to identify which one of two input images has higher contrast. The contrast discriminator networks learn a linear classifier over frozen features of pretrained DNNs. At the testing stage, we measured the network's CSF by presenting sinusoidal gratings of different orientations and spatial frequencies. Our results demonstrate that the pretrained DNNs show the band-limited, inverted U-shaped CSF, that is characteristic of the human CSF. The exact shape of the DNNs' CSF appears to be task-dependent. The human CSF is better captured by scene segmentation DNNs than image classification ones (tested under identical settings such as the architecture and computational complexity). When the network was trained to discriminate the contrast in natural images from scratch (i.e., with no previous training task), a relatively flat CSF emerged, dissimilar to the human CSF. The environment of pretrained DNNs proved influential as well. Those with a diet of natural images, similar to human experience (e.g. objects or faces), obtain more human-like CSF. Pretrained DNNs with aerial pictures, such as a bird flying in the sky would see, obtain a shifted CSF towards higher spatial frequencies, similar to that of an eagle. In conclusion, the CSFs derived from the DNNs are the result of efficient processing of the natural world around us. Different visual environments have succinct effects on the shape of the CSFs.