Purchase this article with an account.
Krista Ehinger, Kevin Joseph, Wendy Adams, Erich Graf, James Elder; Learning to identify depth edges in real-world images with 3D ground truth. Journal of Vision 2017;17(10):330. doi: https://doi.org/10.1167/17.10.330.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Luminance edges in an image are produced by diverse causes: change in depth, surface orientation, reflectance or illumination. Discriminating between these causes is a key step in visual processing, supporting segmentation and object recognition. Previous work has shown that humans can discriminate depth from non-depth edges based on a relatively small visual region around the edge (Vilankar et al. 2014), however little is known about the visual cues involved in this discrimination. Attempts to address this question have been based upon small and potentially biased hand-labelled datasets. Here we employ the new Southampton-York Natural Scenes (SYNS) 3D dataset (Adams et al. 2016) to construct a larger and more objective ground truth dataset for edge classification and train a deep network to discriminate depth edges from other kinds of edges. We used a standard computer vision edge detector to identify visible luminance edges in the HDR images and fit planar surfaces to the 3D points on either side of the edge. Based on these planar fits we classified each edge as depth or non-depth. We trained convolutional neural networks on a subset of SYNS scenes to discriminate between these two classes based only on the information in contrast-normalized image patches centred on each edge. We found that the networks were able to discriminate depth edges in a reserved test subset of the SYNS scenes with 81% accuracy. Interestingly, this performance was relatively invariant with patch size. Although performance decreased when information about edge orientation or color was removed, it remained in the 72-75% range, suggesting a larger role for spatial (e.g., blur, textural, configural) cues. These results demonstrate that 1) The SYNS dataset can be used to provide 3D ground truth for visual tasks, and 2) Colour, orientation and spatial cues are all important for the local discrimination of depth edges.
Meeting abstract presented at VSS 2017
This PDF is available to Subscribers Only