Abstract
According to the efficient coding hypothesis (Barlow, 1959), neural coding serves to reduce statistical dependencies present in visual images, eliminating redundancies and thus yielding more compact perceptual representations. It has been shown (Kersten, 1987) that humans can invert these representations to generate accurate perceptual predictions of information missing in the image.
Like the brain, Generative adversarial networks (GANs) learn a hierarchical representation that can be inverted to make predictions in the image domain. Here, we explore to what degree GANs can serve as a model for the progression in statistical independence seen in the brain. We do this by assessing the degree to which observers can estimate missing information at each layer of a Wasserstein GAN (wGAN). If the GAN’s encoding of the image parallels our neural encoding, human ability to predict unit activations should decline from shallow layers near the image domain to deeper layers within the wGAN.
Method: A wGAN was trained on CIFAR10. Images were generated by randomly sampling values in the latent layer, and then propagating activations through the network to the image. In each trial, a target unit in one of five layers of the wGAN was randomly selected for analysis. Observer estimates of target activations were identified with a nested adjustment task (Bethge, 2007) involving a telescoping sequence of image triplets. Each triplet was generated using three different values for the target unit, and observers had to select the image that appeared most natural.
Results: We observed a systematic decrease in Pearson correlation between true and estimated values of target unit activations as we advanced from shallow layers near the image domain to deep layers, reflecting an increase in perceptual independence as a function of depth. Thus wGANs may form a reasonable model for the progressive elimination of perceptual redundancies in human visual coding.