Purchase this article with an account.
Jan Jaap R Van Assen, Shin’ya Nishida, Roland W Fleming; Visual perception of liquids: insights from deep neural networks. Journal of Vision 2019;19(10):242b. doi: https://doi.org/10.1167/19.10.242b.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Perceiving viscosity is challenging because liquids exhibit incredibly diverse behaviors across scenes. To gain insights into how the visual system estimates viscosity, we developed an artificial network consisting of three convolution layers and one fully-connected layer. We trained it to estimate viscosity in 100,000 computer-generated animations of liquids with 16 different viscosities across 10 scenes with diverse liquid interactions, e.g. stirring or gushing. As expected, with sufficient training, the network predicts physical viscosity nearly perfectly for the training set. Importantly, in the initial training phase, it showed a pattern of errors more similar to human observers for both trained and untrained sets. We analyzed the representations learned by the network using representational similarity analysis. Surprisingly, viscosity (either physical or perceptual) had little explanatory power on the activation pattern of the 4096 units in the final fully-connected (FC) layer. We suspected that the final layer has a rich representational capacity so that it could represent seemingly task-irrelevant stimulus dimensions in addition to the task-relevant one. Indeed, we found it easy to read out scene category from the FC activation patterns. When we reduced the number of units in the FC layer to 40 or less (which had little effect on viscosity prediction performance), viscosity explained the FC activation patterns very well. Inspired by classical psychophysical and neurophysiology experiments, we then probed artificial neural responses with low-level stimuli isolating specific stimulus characteristics. This allowed us to identify nodes sensitive to specific orientations, spatial frequencies, motion types, etc., and to test the extent to which high-level responses can be understood as their weighted combinations. Together, these observations suggest that the nature of the representation depends on the capacity of the network, and this is likely the case for biological networks as well.
This PDF is available to Subscribers Only