August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Deep network representation of art style similarity judgments
Author Affiliations
  • Anna Bruns
    New York University
  • Ming Gao
    New York University
  • Abhishek Dendukuri
    New York University
  • Jenna Eubank
    New York University
Journal of Vision August 2023, Vol.23, 5976. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Anna Bruns, Ming Gao, Abhishek Dendukuri, Jenna Eubank; Deep network representation of art style similarity judgments. Journal of Vision 2023;23(9):5976.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

This study investigates the ways humans and machines judge the visual similarity of paintings pulled from the WikiArt dataset–in terms of their styles, their content, and holistically–by conducting a behavioral study with human participants, training a deep neural network, and then examining behavioral data in tandem with model results. In the behavioral study, participants rated pairs of paintings in terms of their style, subject matter, and overall visual similarity. To extract similarity judgments from neural networks, we first compared results from three models. Two were models pre-trained on ImageNet: VGG-16 (Simonyan & Zisserman, 2014) and AlexNet (Krizhevsky, Sutskever, & Hinton, 2012), 16 and 8 layers respectively. The third was a basic CNN with 5 layers. These CNNs were trained to perform style classification on 4 art styles, with the best model achieving an accuracy of 48.95%. We used the final layer of the neural networks to compute cosine similarity ratings for the same pairs of paintings we showed participants. Overall, we found that the best performing CNN modeled human similarity judgments well, as long as we constrained the kinds of data taken into account. AlexNet and VGG-16 both modeled human similarity scores well, with correlations of 0.72, for pairs with matching subject and style. These models’ results also aligned quite well with human judgments for pairs with matching subject but different style, with VGG achieving a correlation of 0.48 and AlexNet achieving a correlation of 0.46. Without restricting the set of image pairs, AlexNet and VGG-16 achieved correlations of just 0.31 and 0.37, respectively. This suggests that neural networks are better able to model human style similarity scores when paintings’ subject matter is eliminated as a factor of consideration in the similarity judgment.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.