Purchase this article with an account.
Hadar Gorodissky, Daniel Harari, Shimon Ullman; Large field and high resolution: detecting needle in a haystack. Journal of Vision 2018;18(10):517. doi: 10.1167/18.10.517.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
A general-purpose visual system needs to combine the ability to acquire highly detailed information with the ability to cover a large field-of-view (FOV). The human FOV spans over 120 degrees, with peak resolution approaching 0.5 arcminutes. Covering such a large field at this resolution will require acquiring images of roughly 200 million samples. Anatomically, such a requirement is infeasible, and computationally, current machine vision schemes will require a scaling of processing power by three orders of magnitude. Here we studied the combination of a large FOV with high-resolution in target detection tasks, given a limited 'budget' of sampling points to form an image. We compared different designs of distributing sampling points across the visual field, including non-uniform configurations. In particular, we compared models of constant resolution versus models of variable resolution inspired by human vision, with peak resolution at the center, which decreases with eccentricity. For the constant models, we compared trade-offs between resolution and FOV size. For the variable models, we compared between a single channel with varying resolution, and multiple channels, each with a different constant resolution. We focused on the challenging task of localizing small targets of interest in natural images, and compared performance using state-of-the-art deep neural nets to train models (of equal resources), which use successive steps, by fixating at the target location predicted by the preceding step. The results first indicate that the variable resolution models significantly outperform constant resolution models, and converge to the optimal, full-resolution model, using only 5% of the samples used by the full-resolution model. Surprisingly, within the variable models, the use of multiple parallel channels outperforms the use of a single, varying resolution channel. Finally, unlike constant resolution models which used a single step, variable models used multiple steps, however, model convergence was rapid, 1.5 steps on average.
Meeting abstract presented at VSS 2018
This PDF is available to Subscribers Only