Abstract
Previous work has shown that multiple cues may signal object boundaries (Rivest & Cavanagh, 1996) but little is known about the diagnosticity of these cues for natural scenes, either individually or in combination. For this study, we constructed a database of color binocular video sequences with a consumer-grade stereo camera. The dataset includes a variety of places (from streets to parks), settings (from summer to winter) and objects, to minimize bias. Image annotations were collected from participants who were instructed to draw clearly visible object boundaries. We started from a standard model of the primary visual cortex (a battery of center-surround and oriented receptive fields at multiple spatial frequencies with parameters constrained by neurophysiology), and extended it to include biologically realistic mechanisms for color, binocular disparity and motion processing. In this work, we explore the statistical correlations between these cues, evaluate their diagnosticity and learn their optimal combination using the framework by Martin et al (2004). We found color and contrast to be most diagnostic for the presence of boundaries in natural scenes. Stereo and motion were the least diagnostic cues. Surprisingly, we found visual cues to be relatively uncorrelated both at boundary and non-boundary locations. Divisive normalization (Heeger & Carandini, 1992) over a carefully selected pool of units was critical to achieve good accuracy. For all cues, a higher-order texture-based visual representation as proposed by Martin et al (2004) was found to be more diagnostic than a representation based on the direct output of orientation-tuned units. Overall we found that combining multiple cues improved contour detection beyond any of the cues presented in isolation suggesting that our visual system may benefit from the combination of multiple visual cues for boundary detection.
Meeting abstract presented at VSS 2014