September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Visual perception of liquids: insights from deep neural networks
Author Affiliations & Notes
  • Jan Jaap R Van Assen
    NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation
  • Shin’ya Nishida
    NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation
  • Roland W Fleming
    Department of Psychology, Justus-Liebig-University Giessen
Journal of Vision September 2019, Vol.19, 242b. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Jan Jaap R Van Assen, Shin’ya Nishida, Roland W Fleming; Visual perception of liquids: insights from deep neural networks. Journal of Vision 2019;19(10):242b. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Perceiving viscosity is challenging because liquids exhibit incredibly diverse behaviors across scenes. To gain insights into how the visual system estimates viscosity, we developed an artificial network consisting of three convolution layers and one fully-connected layer. We trained it to estimate viscosity in 100,000 computer-generated animations of liquids with 16 different viscosities across 10 scenes with diverse liquid interactions, e.g. stirring or gushing. As expected, with sufficient training, the network predicts physical viscosity nearly perfectly for the training set. Importantly, in the initial training phase, it showed a pattern of errors more similar to human observers for both trained and untrained sets. We analyzed the representations learned by the network using representational similarity analysis. Surprisingly, viscosity (either physical or perceptual) had little explanatory power on the activation pattern of the 4096 units in the final fully-connected (FC) layer. We suspected that the final layer has a rich representational capacity so that it could represent seemingly task-irrelevant stimulus dimensions in addition to the task-relevant one. Indeed, we found it easy to read out scene category from the FC activation patterns. When we reduced the number of units in the FC layer to 40 or less (which had little effect on viscosity prediction performance), viscosity explained the FC activation patterns very well. Inspired by classical psychophysical and neurophysiology experiments, we then probed artificial neural responses with low-level stimuli isolating specific stimulus characteristics. This allowed us to identify nodes sensitive to specific orientations, spatial frequencies, motion types, etc., and to test the extent to which high-level responses can be understood as their weighted combinations. Together, these observations suggest that the nature of the representation depends on the capacity of the network, and this is likely the case for biological networks as well.

Acknowledgement: JSPS KAKENHI (JP15H05915) and an ERC Consolidator Award (ERC-2015-CoG-682859: 

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.