August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Spatial-frequency channels for object recognition by neural networks are twice as wide as those of humans
Author Affiliations
  • Ajay Subramanian
    New York University
  • Elena Sizikova
    New York University
  • Najib J. Majaj
    New York University
  • Denis G. Pelli
    New York University
Journal of Vision August 2023, Vol.23, 5032. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ajay Subramanian, Elena Sizikova, Najib J. Majaj, Denis G. Pelli; Spatial-frequency channels for object recognition by neural networks are twice as wide as those of humans. Journal of Vision 2023;23(9):5032.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

We compare how well human observers and neural networks recognize ImageNet images in filtered noise using the critical-band masking paradigm (Fletcher, 1940). We assess the spatial-frequency tuning of object recognition by measuring its noise tolerance at various frequencies. 16 observers and 4 neural networks performed 16-way categorization of 1050 ImageNet images perturbed by band-limited Gaussian noise of five strengths centered at seven spatial frequencies. We find that the noise sensitivity of object recognition is an inverted, U-shaped function of spatial frequency. Human observer performance is severely impaired over an octave-wide band. Such octave-wide selectivity has frequently appeared in the vision literature where it is called a “channel” (Solomon & Pelli, 1994). Here, this channel is revealed in heatmaps of human and network performance. A demo of this channel presents images perturbed with noise at various frequencies and strengths. Together with previous work (Majaj et al., 2002), this shows that the bandwidth of the human channel is conserved across diverse target kinds: real-world objects, gratings, and letters. We find the classic octave-wide frequency band for humans, and a two-octave-wide band for machines, both centered at ~28 cycles per image. At the peak, humans tolerate four times as much noise variance as networks do. When recognizing objects, networks are known to rely on texture cues whereas humans rely on shape (Geirhos et al., 2018). Data augmentation helps bridge this gap (Hermann et al., 2020, Geirhos et al., 2021, Muttenthaler et al., 2022). Our results show that perhaps the popular notion of texture-vs-shape bias simply reflects the width of the spatial-frequency channel.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.