September 2017
Volume 17, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2017
Expecting and detecting objects in real-world scenes: when do target, nontarget and coarse scene features contribute?
Author Affiliations
  • Harish Katti
    Centre for Neuroscience, Indian Institute of Science, Bangalore, India, 560012
  • Marius Peelen
    Center for Mind/Brain Sciences, University of Trento, 38068 Rovereto, Italy
  • S. P. Arun
    Centre for Neuroscience, Indian Institute of Science, Bangalore, India, 560012
Journal of Vision August 2017, Vol.17, 299. doi:10.1167/17.10.299
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Harish Katti, Marius Peelen, S. P. Arun; Expecting and detecting objects in real-world scenes: when do target, nontarget and coarse scene features contribute?. Journal of Vision 2017;17(10):299. doi: 10.1167/17.10.299.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Humans excel at finding objects in complex natural scenes but understanding this behaviour has been difficult because natural scenes contain targets, nontargets and coarse scene features. Here we performed two studies to elucidate object detection on natural scenes. In Study 1, participants detected cars or people in a large set of natural scenes. For each scene, we extracted target-associated features, annotated nontarget objects, and extracted coarse scene structure and used them to model detection performance. Our main finding is that target detection in both person and car tasks was predicted using target and coarse scene features, with no discernible contribution of nontarget objects. By contrast, nontarget objects predicted target rejection times in both person and car tasks, with contributions from target features for person rejection. In Study 2, we sought to understand the computational advantage of context. Context is commonly thought of reducing computation by constraining locations to search. But can it have a more fundamental role in making detection more accurate? To do so, scene context must be learned independently from target features. Humans, unlike computers, can learn contextual expectations separately when we see scenes without targets. To measure these expectations, we asked subjects to indicate the scale, location and likelihood at which targets may occur in scenes without targets. Humans showed highly systematic expectations that we could accurately predict using scene features. Importantly, we found that augmenting state-of-the art deep neural networks with these human-derived expectations improved performance. This improvement came from accepting poor matches at highly likely locations and rejecting strong matches at unlikely locations. Taken together our results show that humans show systematic behaviour in detecting objects and forming expectations on natural scenes that can be predicted and understood using computational modelling.

Meeting abstract presented at VSS 2017

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×