September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
The stuff of natural scenes: probing human property judgments of textures, materials, and other amorphous scene components with convolutional neural networks
Author Affiliations
  • Neha Nandiwada
    Johns Hopkins University
  • Caterina Magri
    Johns Hopkins University
  • Michael Bonner
    Johns Hopkins University
Journal of Vision September 2021, Vol.21, 2483. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Neha Nandiwada, Caterina Magri, Michael Bonner; The stuff of natural scenes: probing human property judgments of textures, materials, and other amorphous scene components with convolutional neural networks. Journal of Vision 2021;21(9):2483.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Scenes are composed not only of discrete objects with defined shapes but also complex visual “stuff” in the form of amorphous textures and patterns (e.g., grass, bricks, smoke). Many behaviorally relevant properties of scenes can be quickly recognized based solely on the stuff they contain—e.g., hotness of fire, hardness of concrete, fragility of glass. Though much work has explored how the human visual system represents individual objects, less is known about how we process the amorphous stuff that makes up most of the visual environment. Furthermore, an open question is to determine what classes of computational models can account for the human ability to rapidly detect a rich set of high-level properties from a brief glance at a patch of visual stuff. To address these questions, we developed a dataset of 500 high-quality images from 50 categories of textures encountered in real-world environments, and we collected annotations of readily identifiable qualitative properties of these images (e.g., material properties, haptic properties, semantic attributes). In preliminary investigations, we sought to determine whether computational models trained for object recognition also yield representations that are useful for predicting the qualitative properties of visual stuff. Our findings show that while many perceptual and haptic properties (e.g., bumpiness) are equally well-predicted by both supervised and untrained convolutional neural networks, high-level semantic attributes (e.g., naturalness) were much better predicted by supervised models. Nonetheless, all models failed to reach the noise ceiling for the majority of the property annotations, demonstrating that further work is needed to account for the richness of human stuff perception. Our dataset will provide a critical benchmark for computational models and will be useful for follow-up studies that seek to understand how humans recognize the many qualitative properties of the stuff in their visual surroundings.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.