Abstract
A big challenge for cognitive neuroscience is to build and apply models based on theoretical knowledge about the human visual system. Recently computational models, inspired from machine learning/computer vision, are increasingly compared and tested against brain responses. These models are highly complex, for example, Convolutional Neural Networks have 7 (upto 22) layers and HMAX, BoW too have a number of computational layers. To properly understand how the different layers compare against the brain and also the differences between models, we need to determine the unique variance to explain brain responses. In our study we determine the amount of unique variance in brain responses explained by layers of various hierarchical vision models. We acquired BOLD fMRI data from 20 subjects who watched an 11 minute natural movie and employed variation partitioning to explain local BOLD variation. This was done using dissimilarity matrices at different layers of representation for three models : HMAX, BoW and CNN. We found that low-level representations such as SIFT and Gabor uniquely contribute in explaining BOLD activity, suggesting that they capture different representations in the brain. At intermediate levels, most of the explained variance by HMAX is shared with BoW, while BoW explains additional BOLD activity. In addition to the unique variance of HMAX and BoW, CNN layers uniquely explained BOLD variation in higher brain areas. Within models, higher layers of HMAX and BoW add unique variance when compared to their respective low-level features. For more complex models such as CNN, we find that certain CNN layers do not add any unique variance. Overall, our results suggest that analyzing computational models of object recognition on the basis of their unique variance provides a different perspective on how these models capture visual representations in the human brain.
Meeting abstract presented at VSS 2016