Abstract
A core task of the primate visual system is to organize its retinal input into coherent figural objects. While psychological theories dating back to Ullman (1984) suggest that such object segmentation at least partially relies on feedback, little is known about how these computations are implemented in neural circuits. Here we investigate this question using the neural circuit model of Serre et al. (VSS 2020), which is trained to solve visual tasks by implementing recurrent contextual interactions through horizontal feedback connections. When optimized for contour detection in natural images, the model rivals human performance and exhibits sensitivity to contextual illusions typically associated with primate vision, despite having no explicit constraints to do so. Our goal here is to understand whether the visual routine this feedback model discovers for object segmentation can explain the one used by human observers, as measured in a behavioral experiment where participants judged if a cue dot fell on the same or different object silhouette than a fixation dot (Jeurissen et al. 2016). To train the model, we built a large natural image dataset of object outlines (N~250K), where each sample included a “fixation” dot on one object. The model learned to segment the target object by adopting an incremental grouping strategy resembling the growth-cone family of psychology models for figure-ground segmentation, through which it achieved near-perfect segmentation accuracy on a validation dataset (F1=.98) and the novel stimulus set used by Jeurissen et al. (N=22, F1=.98). Critically, the model exhibited a similar pattern of reaction times as humans, indicating that its circuit constraints reflect possible neural substrates for the visual routines of object segmentation in humans. Overall, our work establishes task-optimized models of neural circuits as an interface for generating experimental predictions that link cognitive science theory with exact neural computations.