December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
Efficiently-generated object similarity scores predicted from human feature ratings and deep neural network activations
Author Affiliations & Notes
  • Martin N Hebart
    Max Planck Institute for Human Cognitive & Brain Sciences
  • Philipp Kaniuth
    Max Planck Institute for Human Cognitive & Brain Sciences
  • Jonas Perkuhn
    Max Planck Institute for Human Cognitive & Brain Sciences
  • Footnotes
    Acknowledgements  This work was supported by a research group grant of the Max Planck Society awarded to M.N.H.
Journal of Vision December 2022, Vol.22, 4057. doi:https://doi.org/10.1167/jov.22.14.4057
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Martin N Hebart, Philipp Kaniuth, Jonas Perkuhn; Efficiently-generated object similarity scores predicted from human feature ratings and deep neural network activations. Journal of Vision 2022;22(14):4057. https://doi.org/10.1167/jov.22.14.4057.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

A central aim in vision science is to elucidate the structure of human mental representations of objects. A key ingredient to this endeavor is the assessment of psychological similarities between objects. A challenge of current methods for exhaustively sampling similarities is their high resource demand, which grows non-linearly with the number of stimuli. To overcome this challenge, here we introduce an efficient method for generating similarity scores of real-world object images, using a combination of deep neural network activations and human feature ratings. Rather than directly predicting similarity for pairs of images, our method first predicts each image’s values on a set of 49 previously established representational dimensions (Hebart et al., 2020). Then, these values are used to generate similarities for arbitrary pairs of images. We evaluated the performance of this method using dimension predictions derived from the neural network architecture CLIP-ViT as well as direct human ratings of object dimensions collected through online crowdsourcing (n = 25 per dimension). Human ratings were collected on a set of 200 images, and generated similarity was evaluated on two separate sets of 48 images. CLIP-ViT performed very well at predicting global similarity for 1,854 objects (r = 0.89). Applying CLIP-ViT predictions to three existing neuroimaging datasets rivaled and often even outperformed previous existing behavioral similarity datasets. For the 48 image sets, both humans and CLIP-ViT provided good predictions of image dimension values across several datasets, leading to very good predictions of similarity (humans: R2 = 74-77%, CLIP-ViT: R2 = 76-82% explainable variance). Combining dimension predictions across humans and CLIP-ViT yielded a strong additional increase in performance (R2 = 84-87%). Together, our method offers a powerful and efficient approach for generating similarity judgments and opens up the possibility to extend research using image similarity to large stimulus sets.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×