Purchase this article with an account.
Mark Lescroart, David Fouhey, Jitendra Malik; Convolutional neural networks represent shape dimensions—but not as accurately as humans. Journal of Vision 2018;18(10):421. doi: 10.1167/18.10.421.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Modern convolutional neural networks (CNNs) trained to categorize images are known to provide good models for many aspects of human vision, including shape perception. However, the relative ability of humans and CNNs on basic 3D shape dimension discrimination is unknown, as is how learning such tasks changes CNN shape representations. To address these issues, we generated a novel, large dataset of rendered generalized cylinders (geons). The dataset consisted of 48 geon classes defined by four shape dimensions. Each geon was rendered in 6075 pose and metric variations. This yielded 291,600 images, enough to fine-tune CNN filter weights. This fine-tuning results in a 68.7% increase in geon classification accuracy compared to CNNs trained on image categorization alone. We used a two-alternative match-to-sample test to compare human performance to CNN performance. On this task, humans achieved higher accuracy (86%) than the best CNN (75%). Both humans and CNNs were more accurate at distinguishing geons that differed in multiple dimensions, and both humans and intermediate layers of CNNs were less accurate when pairs of geons were rotated in different directions. However, humans showed increased ability to distinguish some specific dimensions relative to the CNNs. Finally, analysis of feature activations in intermediate layers of the CNNs revealed that training makes nodes in intermediate layers of the CNNs more selective for specific shape dimensions (e.g. curved axes) while analyses testing generalization across metric variants suggested that CNNs may represent shape dimensions by interpolating between discrete values along each dimension. In conclusion, off-the-shelf CNNs contain some information about 3D shape dimensions, but training CNNs on a shape task can drastically alter this information. Individual nodes in trained networks reflect individual shape dimensions, and trained networks achieve near (but not quite) human-level accuracy at distinguishing complex shapes based on shape information alone.
Meeting abstract presented at VSS 2018
This PDF is available to Subscribers Only