December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
Isolating Global Form Processing Using Shape Metamers
Author Affiliations & Notes
  • George Alvarez
    Harvard University
  • Talia Konkle
    Harvard University
  • Footnotes
    Acknowledgements  NSF PAC COMP-COG 1946308 to GAA, NSF CAREER BCS-1942438 to TK
Journal of Vision December 2022, Vol.22, 4082. doi:https://doi.org/10.1167/jov.22.14.4082
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      George Alvarez, Talia Konkle; Isolating Global Form Processing Using Shape Metamers. Journal of Vision 2022;22(14):4082. https://doi.org/10.1167/jov.22.14.4082.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

A critical disconnect between human and machine vision is that people have a remarkable ability to represent and understand the shapes of objects, whereas deep convolutional neural networks (CNNs) operate more as local texture analyzers with no propensity to perceive global form. To isolate and study the representation of global form, we developed a novel stimulus manipulation which enables us to probe global form similarity in a way that is completely isolated from local similarity. Specifically, our method renders natural images as a composition of Gabor elements of different orientations and scales. From this basis, we generate shape metamers--image pairs that differ at each local gabor, but share the same global form information and are nearly perceptually indistinguishable by humans. And, we generate anti-metamers — image pairs that differ by the same amount locally, but disrupt global form information, and look nothing alike to humans. Leveraging these stimuli as a litmus test for global form perception, we find that CNNs trained to do object categorization show little sensitivity to global form information in their feature spaces, which encode both the shape metamers and anti-metamers as equally similar. However, we find that both vision transformer models with self-attention layers, as well as our simpler custom models with hand-designed “association fields,” can learn longer-range relationships between local features, and have increased sensitivity to global form. These findings highlight that the CNN model lacks sufficient inductive biases to learn global form information, and that self-attention and association field mechanisms may serve as key precursor operations which amplify relevant local features. We propose this multiplicative operation is critical to enable downstream mechanisms to encode relationships between primarily shape-defining local features, en route to a more explicit global shape representation.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×