September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Leveling the Field: Comparing the Visual Perception of Stability across Humans and Machines
Author Affiliations & Notes
  • Colin Conwell
    Harvard University, Department of Psychology
  • George A Alvarez
    Harvard University, Department of Psychology
Journal of Vision September 2019, Vol.19, 26a. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Colin Conwell, George A Alvarez; Leveling the Field: Comparing the Visual Perception of Stability across Humans and Machines. Journal of Vision 2019;19(10):26a.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Is there a physicist in your visual cortex? Popular models of intuitive physics—our implicit understanding of physical contingencies in complex environments—posit the process of physical inference to be a richly structured simulation: a predominately cognitive process. In this study, we probe the possibility that at least certain aspects of our intuitive physics may be handled directly by computations in perceptual systems. Assuming deep neural networks to be a reasonable model of inferotemporal visual cortex, we employ a method of comparative psychophysics designed to gauge the similarity of human and machine judgments in a standard intuitive physics task: predicting the stability of randomly arranged block towers. We show that a convolutional neural network with comparable performance to human observers nonetheless differs in the variables that predict the specific choices it makes, variables we compute directly from the stimuli. Using these ‘features’ as the basis for an ideal observer analysis, we show that human behavior is best predicted by a feature that corresponds directly to the groundtruth stability of the tower, while the machine’s behavior is predicted by a less optimal feature. Training smaller, feedforward networks, we subsequently confirm that this divergence from human behavior is not the failure of any specific computation (e.g. an operation the network simply cannot perform), but of different feature biases. Simultaneously, we demonstrate that humans under time pressure tend to behave more like the neural network, their responses predicted by features that correlate less overall with the groundtruth stability of the tower. Taken together, these results suggest that at least some portion of the information processing involved in intuitive physics may be handled by the more feedforward elements of the visual system, but that further algorithmic, architectural or training modulations might be necessary to better model the perceptual processing of physical information more generally.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.