September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Deep feature matching vs spatio-temporal energy filtering for robust moving object segmentation
Author Affiliations & Notes
  • Matthias Tangemann
    University of Tübingen, Tübingen AI Center
  • Matthias Kümmerer
    University of Tübingen, Tübingen AI Center
  • Matthias Bethge
    University of Tübingen, Tübingen AI Center
  • Footnotes
    Acknowledgements  Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Germany’s Excellence Strategy – EXC 2064/1 – 390727645 and SFB 1233, TP4, project number: 276693517.
Journal of Vision September 2024, Vol.24, 872. doi:https://doi.org/10.1167/jov.24.10.872
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Matthias Tangemann, Matthias Kümmerer, Matthias Bethge; Deep feature matching vs spatio-temporal energy filtering for robust moving object segmentation. Journal of Vision 2024;24(10):872. https://doi.org/10.1167/jov.24.10.872.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Recent methods for optical flow estimation achieve remarkable precision and are successfully applied in downstream tasks such as segmenting moving objects. These methods are based on matching deep neural network features across successive video frames. For humans, in contrast, the dominant motion estimation mechanism is believed to rely on spatio-temporal energy filtering. Here, we compare both motion estimation approaches for segregating a moving object from a moving background. We render synthetic videos based on scanned 3d objects and backgrounds to obtain ground truth motion for realistic scenes. We transform the videos by replacing the textures with random dots that follow the motion of the original video. This way, each individual frame does not contain any other information about the object apart from the motion signal. Humans have been shown to be able to use random dot motion for recognizing objects in these stimuli (Robert et al. 2023). We compare segmentation methods based on the recent RAFT optical flow estimator (Teed and Deng 2020) and the spatio-temporal energy model of Simoncelli & Heeger (1998). Our results show that the spatio-temporal energy approach works almost as well as using RAFT for the original videos when combined with an established segmentation architecture. Furthermore, we quantify the amount of segmentation information that can be decoded from both models when using the optimal non-negative superposition of feature maps for each video. This analysis confirms that both optic flow representations can be used for motion segmentation while RAFT performs slightly better for the original videos. For the random dot stimuli however, hardly any information about the object can be decoded from RAFT while the brain-inspired spatio-temporal energy filtering approach is only mildly affected. Based on these results we explore the use of spatio-temporal filtering for building a more robust model for moving object segmentation.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×