Abstract
A slew of recent studies have shown how deep neural networks (DNNs) optimized for visual tasks make effective models of neural response patterns in the ventral visual stream. Analogous results have also been discovered in auditory cortex, where optimizing DNNs for speech-recognition tasks has produced quantitatively accurate models of neural response patterns in auditory cortex. The existence of computational models within the same architectural class for two apparently very different sensory representations begs several intriguing questions: (1) to what extent do visual models predict auditory response patterns, and to what extent to do auditory models predict visual response patterns? (2) In what ways are the vision and auditory models models similar, and what ways do they diverge? (3) What do the answers to the above questions tell us about the relationships between the natural statistics of these two sensory modalities and the underlying generative processes behind them? Ill describe several quantitative and qualitative modeling results, involving electrophysiology data from macaques and fMRI data from humans, that shed some initial light on these questions.
Meeting abstract presented at VSS 2016