September 2017
Volume 17, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2017
Evidence for Configural Superiority Effects in Convolutional Neural Networks
Author Affiliations
  • Shaiyan Keshvari
    Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
  • Ruth Rosenholtz
    Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
    Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
Journal of Vision August 2017, Vol.17, 169. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Shaiyan Keshvari, Ruth Rosenholtz; Evidence for Configural Superiority Effects in Convolutional Neural Networks. Journal of Vision 2017;17(10):169.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Finding a left-tilted line among right-tilted becomes easier with an "L" added to each item, transforming the task into one of finding a triangle among arrows. This configural superiority effect occurs for a wide array of stimuli, and is thought to result from vision utilizing "emergent" features (EFs), such as closure, in the composite case. A more computational interpretation can be couched in idea of a visual processing hierarchy, in which higher level representations support complex tasks at the expense of other, possibly less ecologically relevant tasks. Detecting the oddball might be inherently easier in the composite condition given the representation at some level of the hierarchy. To test this, we used the VGG-16 (Simonyan & Zisserman, 2015) convolutional neural network (CNN), trained to recognize objects using the ImageNet dataset, as a stand-in for the hierarchical visual encoding. Such CNNs have high performance on object recognition, as well as on tasks for which they are not trained. Feature vectors at different layers correlate with responses of various brain areas (Hong et al., 2015). We tested five EF stimuli in a 4AFC oddball localization task (Pomerantz & Cragin, 2013). We trained a multi-class SVM operating on the outputs of the last fully connected layer, and performed a K-fold cross-validation. Two EFs (orthogonality and roundness) show better performance (33 and 53 percentage points, respectively) in the composite than the base case. One (closure) showed no effect (< 1 pp), and two (parallelism and 3D) had worse performance in the composite (23 and 21 pp). A pilot behavioral experiment (200 ms presentation) confirmed that observers (N=2) are better with composite stimuli for all five EFs (44 +/- 0.06 pp). This suggests that some EFs are better represented by highest layers of the network than their base features, but it is not the complete story.

Meeting abstract presented at VSS 2017


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.