September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Differential sensitivity of humans and deep networks to the amplitude and phase of shape features
Author Affiliations & Notes
  • Nicholas Baker
    Loyola University of Chicago
  • John Wilder
    Northeastern University
  • James Elder
    York University
  • Footnotes
    Acknowledgements  The York University Research Chair program, VISTA, NSERC
Journal of Vision September 2024, Vol.24, 1281. doi:https://doi.org/10.1167/jov.24.10.1281
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Nicholas Baker, John Wilder, James Elder; Differential sensitivity of humans and deep networks to the amplitude and phase of shape features. Journal of Vision 2024;24(10):1281. https://doi.org/10.1167/jov.24.10.1281.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Background: While humans are highly sensitive to global shape information, deep neural networks models (DNNs) trained on ImageNet seem to favor local shape features. In the Fourier descriptor (shape frequency) domain, this manifests as much higher human sensitivity to low shape frequencies. Here we ask how this differential sensitivity depends upon the amplitude vs phase structure of these Fourier shape components. Methods: Human observers (n=68) classified animal silhouettes into nine categories. The shapes were lowpass filtered in the shape frequency domain, over a range of frequency cutoffs, using two filtering methods. In method 1, Fourier components beyond the cutoff were zeroed. In method 2, phases were randomized but amplitudes were preserved. We compared human performance against three representative networks: a convolutional model (ResNet-50) and two transformer models (ViT, SWIN). Results: While switching from filtering method 1 to method 2 resulted in a slight decline in human performance, it led to a significant improvement for the networks. What could explain this improvement? One possibility is that networks were simply confused by the smooth shapes produced by method 1. To assess this possibility, we retested the networks using a third filtering method in which phases were randomized and amplitudes set to normative, uninformative values. While performance improved for these more realistic shape stimuli, for the two transformer models (ViT and SWIN), performance remained below levels seen with method 2, indicating that these networks, unlike humans, are able to make effective use of the amplitude structure of low shape frequency components, even when phases are randomized. Conclusions: While humans use low-frequency shape information more effectively than DNNs, they depend critically on the phase structure of these low-frequency shape components. In contrast, transformer networks exploit the texture-like amplitude structure of these components even when phase is randomized.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×