August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
How object segmentation and perceptual grouping emerge in noisy variational autoencoders
Author Affiliations & Notes
  • Ben Lonnqvist
    EPFL (École Polytechnique Fédérale de Lausanne), Switzerland
  • Zhengqing Wu
    EPFL (École Polytechnique Fédérale de Lausanne), Switzerland
  • Michael H. Herzog
    EPFL (École Polytechnique Fédérale de Lausanne), Switzerland
  • Footnotes
    Acknowledgements  BL was supported by the Swiss National Science Foundation grant n. 176153 "Basics of visual processing : from elements to figures".
Journal of Vision August 2023, Vol.23, 4794. doi:https://doi.org/10.1167/jov.23.9.4794
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ben Lonnqvist, Zhengqing Wu, Michael H. Herzog; How object segmentation and perceptual grouping emerge in noisy variational autoencoders. Journal of Vision 2023;23(9):4794. https://doi.org/10.1167/jov.23.9.4794.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Humans and many newborn animals are able to effortlessly perceive objects and to segment them from other objects and the background, and a long-standing debate concerns the question of whether object segmentation is necessary for object recognition. While deep neural networks (DNNs) are state-of-the-art models of object recognition and representation, their performance in segmentation tasks is generally worse than in recognition tasks. For this reason, it is often believed that object segmentation and recognition are separate mechanisms of visual processing. Here, however, we show evidence that in variational autoencoders (VAEs), segmentation and faithful representation of data can be interlinked. VAEs are encoder-decoder models that learn to represent independent generative factors of the data as a distribution in a very small bottleneck layer - for example, when coding for a face, VAEs may empirically code for mouths and eyes independently. Specifically, we show that VAEs can be made to segment objects without any additional finetuning or downstream training. This segmentation is achieved with a procedure that we call the latent space noise trick: by perturbing the activity of the bottleneck units with activity-independent noise, and recurrently recording and clustering decoder outputs in response to these small changes, the model is able to segment and bind separate features together. We demonstrate that VAEs can group elements in a human-like fashion, are robust to occlusions, and produce illusory contours in simple stimuli. Furthermore, the model generalizes to the naturalistic setting of faces, producing meaningful subpart and figure-ground segmentation without ever having been trained on segmentation. For the first time, we show that learning to faithfully represent stimuli can be generally extended to segmentation using the same model backbone architecture without any additional training.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×