August 2016
Volume 16, Issue 12
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2016
Visual features versus categories: Explaining object representations in primate IT and deep neural networks with weighted representational modeling
Author Affiliations
  • Kamila Jozwik
    University of Cambridge
  • Nikolaus Kriegeskorte
    Medical Research Council Brain and Cognition Unit
  • Radoslaw Cichy
    Free University Berlin, Department of Education and Psychology
  • Marieke Mur
    Medical Research Council Brain and Cognition Unit
Journal of Vision September 2016, Vol.16, 511. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Kamila Jozwik, Nikolaus Kriegeskorte, Radoslaw Cichy, Marieke Mur; Visual features versus categories: Explaining object representations in primate IT and deep neural networks with weighted representational modeling. Journal of Vision 2016;16(12):511.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Visual features and category membership are both reflected in the object representation in inferior temporal (IT) cortex. However, the explanatory power of features and categories has not been directly compared. We test whether the IT object representation, in human and monkey, is best explained by a feature-based or a categorical model. We apply the same test to the object representations in a deep convolutional neural network (CNN). Previous work has shown that late layers of the network outperform early layers in explaining the human IT object representation (Khaligh-Razavi & Kriegeskorte 2014; Guclu & van Gerven 2015). We asked human observers to generate category labels (e.g. face, animal) and feature labels (e.g. eye, circular) for a set of 96 real-world object images. This resulted in rich models (> 100 dimensions), which we fitted to the brain and CNN representations using non-negative least squares. The brain representations of the 96 images were previously measured using fMRI (humans) and cell recordings (monkeys). Model performance was estimated on held-out images not used in fitting, and compared using representational similarity analysis. In both human and monkey, the feature-based and categorical model explain significant, and similar, amounts of IT variance (Fig. 1AB). Combining the two models does not explain significant additional variance, indicating that it is the shared model variance (features correlated with categories, categories correlated with features) that best explains primate IT. Consistent with previous findings, late layers of the deep neural network show a pattern of results similar to IT, while early layers are dominated by visual features (Fig. 1C). These results suggest that primate IT, as well as a deep neural network trained on image categorization, use visual features, including object parts that have stereotyped shapes and are strongly associated with particular categories, as stepping stones toward semantics.

Meeting abstract presented at VSS 2016


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.