August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Processing of coarse and fine shape features by humans and deep networks: A shape frequency analysis
Author Affiliations & Notes
  • JAMES ELDER
    York University, Canada
  • Nicholas Baker
    Loyola University
  • John Wilder
    Northeastern University
  • Tenzin Chosang
    York University, Canada
  • Footnotes
    Acknowledgements  We thank the NSERC Discovery, York Research Chair and VISTA programs for their support.
Journal of Vision August 2023, Vol.23, 5958. doi:https://doi.org/10.1167/jov.23.9.5958
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      JAMES ELDER, Nicholas Baker, John Wilder, Tenzin Chosang; Processing of coarse and fine shape features by humans and deep networks: A shape frequency analysis. Journal of Vision 2023;23(9):5958. https://doi.org/10.1167/jov.23.9.5958.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

BACKGROUND. The statistics of coarse and fine shape features can be analyzed using a Fourier descriptor projection. Natural shapes are known to be lowpass: coarse shape features (low shape frequencies) typically have higher amplitudes than fine shape features (high frequencies). Prior work suggests that human shape sensitivity is even more biased toward low shape frequencies than is optimal for natural lowpass shapes, however this was demonstrated only for a simple binary shape discrimination task within a linear classification framework. Deep networks are reported to be more sensitive to local shape features, suggesting a high-frequency bias, but these demonstrations have primarily been on simple artificial stimuli. Here we employ a novel Fourier method to assess the processing of coarse and fine shape features of natural shapes by humans and deep networks, in a more realistic object classification task. METHOD. Human observers (n = 11) classified frequency-filtered animal silhouettes into one of nine animal categories. To assess sensitivity to shape frequencies, the stimuli were high-pass filtered to progressively remove the lowest shape frequencies, with cutoffs ranging from the 2nd to 8th harmonic. Two representative deep networks were also evaluated on the same stimuli: a convolutional network (ResNet-50) and a transformer network (ViT). RESULTS. Both human and deep network performance declined rapidly as low shape frequencies were progressively eliminated. Trial-by-trial analysis revealed that ViT is more predictive of human responses than ResNet-50. Interestingly, the proportion of explainable human variance accounted for by ViT increased from 29% to 57% as more of the low frequencies were eliminated, suggesting that while this transformer model captures some aspects of human selectivity for higher shape frequencies, it struggles to account for human processing of the lower shape frequencies that largely determine human shape judgements.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×