October 2020
Volume 20, Issue 11
Open Access
Vision Sciences Society Annual Meeting Abstract  |   October 2020
Controversial stimuli: adjudicating between deep neural network models of biological vision with synthetic images
Author Affiliations
  • Tal Golan
    Columbia University
  • Prashant C. Raju
    Columbia University
  • Nikolaus Kriegeskorte
    Columbia University
Journal of Vision October 2020, Vol.20, 947. doi:https://doi.org/10.1167/jov.20.11.947
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Tal Golan, Prashant C. Raju, Nikolaus Kriegeskorte; Controversial stimuli: adjudicating between deep neural network models of biological vision with synthetic images. Journal of Vision 2020;20(11):947. https://doi.org/10.1167/jov.20.11.947.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Deep neural networks (DNNs) provide the leading model of biological object recognition, but their power and flexibility come at a price: different DNN models often make very similar predictions. To enable iterative testing and improvement of DNNs as scientific hypotheses about biological vision, we need to efficiently adjudicate between different candidate models. Here, we suggest synthetizing controversial stimuli to achieve this aim. Controversial stimuli are inputs (e.g., images) whose classifications by two (or more) models are incompatible. Since human perceptual judgments of a controversial stimulus cannot be compatible with both models, such judgments are guaranteed to provide evidence against at least one of the models. Therefore, controversial stimuli allow to efficiently contrast the validity of different models. To demonstrate our approach, we have assembled a diverse set of nine models, all trained on MNIST, whose various solutions cannot all be good models of how humans recognize digits. For each pair of models, we sampled random noise images and optimized them to increase the incompatibility of their classifications by the two models. This resulted in a stimulus set that induces disagreement among the models. We tested these stimuli on 30 human observers, who rated the presence of each of the ten digits from 0% to 100% in each image. Contrasting each model's outputs with the judgments of each observer revealed that the two generative models we tested, a shallow Gaussian kernel-density estimate and the deep VAE-based Analysis-by-Synthesis model, were considerably and significantly better at predicting the human responses than all the discriminative DNNs we tested. However, no model entirely explained the explainable variability of the human responses. We discuss controversial stimuli as an efficient experimental paradigm for comparing DNN models of vision and as a practical and conceptual generalization of adversarial examples.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.