Abstract
Humans effortlessly group elements into objects and segment them from the background and other objects without supervision. For example, the black and white stripes of a zebra are grouped together despite vastly different colors. A thorough theoretical and empirical account of perceptual grouping is still missing – Deep Neural Networks (DNNs), which are considered leading models of the visual system still regularly fail at simplistic perceptual grouping tasks. Here, we propose a counterintuitive unsupervised computational approach to perceptual grouping and segmentation: that they arise because of neural noise, rather than in spite of it. We show that adding noise in a DNN enables the network to separate objects and to segment images even though it was never trained on any segmentation labels. To test whether the models exhibit perceptual grouping, we introduce the Good Gestalt (GG) datasets – six datasets based on a century of Gestalt principles specifically designed to test perceptual grouping. These include illusory contours, closure, continuity, proximity, and occlusion. Our DNN using neural noise finds the correct perceptual groups while other control models, including state-of-the-art segmentation models, fail at these critical tests. We further show that our model performs well with remarkably low levels of noise, and requires only few successive time steps to compute. Using simplifying but realistic assumptions from optics, we are also able to mathematically link our model’s perceptual grouping performance to image statistics. Together, our results suggest a novel unsupervised segmentation method requiring few assumptions, a new explanation for the formation of perceptual grouping, and a novel benefit of neural noise.