Abstract
How to establish standards for comparing human and cortically-inspired computer model performance in visual tasks remains largely an open question. Existing standard image classification datasets have several critical shortcomings: 1) Limitations in image resolution and number of images during set creation; 2) Reference to semantic knowledge, such as the definition of “animal,” and 3) Non-parametric complexity or difficulty. To address these shortcomings, we developed a new synthetic dataset, consisting of line segments that can form closed contours in 2D (“amoebas”).
An “amoeba” is a deformed, segmented circle, in which the radius varies with polar angle. Small gaps between segments are preserved so that the contour is not strictly closed. To create a distractor “no-amoeba” image, an amoeba image is divided into boxes of random size, which are rotated through random angles so that their continuity no longer forms a smooth closed object. Randomly superimposed no-amoeba images serve as background clutter. This dataset is not limited in size, relies on no explicit outside knowledge, has tunable parameters so that the difficulty can be varied, and lends itself naturally to a binary object classification task (“amoeba/no-amoeba”) designed to be pop-out for humans.
We show that humans display high accuracy (>90%) for this task in psychophysics experiments, even at short stimulus onset asynchrony=50 ms. Existing feed-forward computer vision models such as HMAX perform close to chance (50–60%). We present a model for V1 lateral interactions that is biologically motivated and significantly improves performance. The model uses relaxation labeling, where support between edge receptors is based on statistics of pair wise correlations learned from coherent objects, but not incoherent segment noise. We compare the effectiveness of this approach to existing computer vision models as well as to human psychophysics performance, and explore the applicability of this approach to contour completion in natural images.