September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Quantifying the Quality of Shape and Texture Representations in Deep Neural Network Models
Author Affiliations & Notes
  • Fenil R. Doshi
    Harvard University
    Kempner Institute for the Study of Natural and Artificial Intelligence
  • Talia Konkle
    Harvard University
    Kempner Institute for the Study of Natural and Artificial Intelligence
  • George A. Alvarez
    Harvard University
    Kempner Institute for the Study of Natural and Artificial Intelligence
  • Footnotes
    Acknowledgements  NSF PAC COMP-COG 1946308 to GAA, NSF CAREER BCS-1942438 to TK, Kempner Institute Graduate Fellowship to FRD
Journal of Vision September 2024, Vol.24, 1263. doi:https://doi.org/10.1167/jov.24.10.1263
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Fenil R. Doshi, Talia Konkle, George A. Alvarez; Quantifying the Quality of Shape and Texture Representations in Deep Neural Network Models. Journal of Vision 2024;24(10):1263. https://doi.org/10.1167/jov.24.10.1263.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Deep Neural Networks (DNNs) have emerged as leading models of high-level visual processing, but there exist key disparities between DNNs and human vision, in particular the models' substantially greater reliance on texture over shape in object recognition. This bias has been quantified through a shape-bias score (Geirhos et al., 2019) where models are presented with images containing conflicting shape and texture cues, and the number of correct shape decisions is divided by the total number of correct decisions (shape correct + texture correct). This shape-bias metric shows significant variance across a broad range of vision models, with more-recent models showing more human-like shape-bias. However, this metric lends itself towards slightly misleading interpretations by not taking the models’ absolute performance into account; for example, a model making just a single correct shape decision, and no correct texture decisions, would have a 100% shape bias, which may incorrectly suggest the model having a strong shape representation. To address this limitation, we propose a revised metric, the accuracy-corrected shape-bias, as the square root of the product of both the original shape-bias score and the shape-dependent accuracy (which reflects the total proportion of correct decisions guided by shape cues). We show that a randomly initialized AlexNet model (5.75% shape-dependent accuracy) shows high original shape-bias (0.466) but low accuracy-corrected shape-bias score (0.164), better capturing the fact that these models have impoverished shape representations. Moreover, across over 100 trained models, we find that increases in shape-bias are due to shape-enhancement and equal or greater texture-suppression, and that none of the models examined have “strong” shape representations (none exceed 52.5% shape-dependent accuracy). Overall, we find that the gulf between human and DNN shape representations remains much larger than suggested by bias-scores alone, and that there has been little improvement in shape quality beyond early AlexNet models.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×