August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Natural-image-computable Bayesian model for 3D motion estimation
Author Affiliations & Notes
  • Daniel Herrera
    Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
  • Johannes Burge
    Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
    Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
    Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
  • Footnotes
    Acknowledgements  This work was supported by NIH grant R01-EY028571 from the National Eye Institute and the Office of Social and Behavioral Science
Journal of Vision August 2023, Vol.23, 5612. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Daniel Herrera, Johannes Burge; Natural-image-computable Bayesian model for 3D motion estimation. Journal of Vision 2023;23(9):5612.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Estimating the motion of objects in depth is important for behavior, and is strongly supported by binocular visual cues. However, our understanding is incomplete of how the brain does, and should, estimate motion-in-depth from binocular signals. Here, using an image-computable ideal observer, we show how to optimally estimate the 3D speed of surface patches in the environment from 250ms naturalistic binocular video clips. First, the model applies a small set of local spatio-temporal linear filters to a binocular video (analogous to simple cells). Then, local 3D speed is non-linearly decoded from the filter population response. The filters and Bayes-optimal decoder are learned to optimize performance in the task. Interestingly, the joint distributions of filter responses, conditioned on 3D speed, are well-approximated by Gaussian distributions. Therefore, optimal decoding of 3D speed requires quadratic combination of the filter responses. Thus, the natural statistics of the filter responses dictate that the normative computations for this task are a biologically-plausible generalization of the widely studied energy model: linear filtering followed by quadratic combinations of responses. Also, similar to human psychophysical behavior, the model learned to use both the time-derivatives of matching binocular features (changing disparity over time; CDOT) and binocular comparisons of time-derivatives of monocular features (interocular velocity differences; IOVD). Like humans, the model weights CDOT cues more heavily at slow speeds and IOVD cues at high speeds. Finally, using the observer model and natural disparity statistics, we propose the novel hypothesis that IOVD cues are more strongly weighted in human peripheral vision in part because, during natural viewing, disparities in the retinal periphery are more variable than those near the fovea. Our results suggest that many characteristics of 3D motion processing are accounted for by near-optimal information processing in the early visual system.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.