September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Do CNNs Trained on Self-Motion Videos Develop Sensitivity to 1st- and 3rd-order Motion?
Author Affiliations & Notes
  • Zhenyu Zhu
    Brown University
  • Thomas Serre
    Brown University
  • William Warren
    Brown University
  • Footnotes
    Acknowledgements  Funding: NIH R01EY029745, NIH 1S10OD025181, NIH T32MH115895
Journal of Vision September 2024, Vol.24, 1101. doi:https://doi.org/10.1167/jov.24.10.1101
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Zhenyu Zhu, Thomas Serre, William Warren; Do CNNs Trained on Self-Motion Videos Develop Sensitivity to 1st- and 3rd-order Motion?. Journal of Vision 2024;24(10):1101. https://doi.org/10.1167/jov.24.10.1101.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

At least two classes of motion information play a role in locomotor control: 1st-order motion energy, such as moving high-contrast texture, and 3rd-order feature-tracking, such as moving object boundaries (Lu and Sperling 1995). Previous literature showed that human heading responses when following a virtual crowd are dominated by 3rd-order motion and weakly influenced by 1st-order motion, revealed when surface texture moves in the Same or Opposite direction as object boundaries (the phi illusion) (Zhu and Warren VSS2023). In this project, we test whether units selective for both 1st and 3rd-order motion emerge in a state-of-the-art Convolutional Neural Network (CNN) model of motion responses in the primate dorsal stream. DorsalNet (Mineault et al. 2021) is a 5-layer CNN trained to estimate self-motion parameters in simulated drone videos. We tested the model’s heading estimates respectively on three virtual crowd displays used in Zhu and Warren’s (VSS2023) human experiments. In the CONTROL display, DorsalNet layers, like humans, show no differences between Same and Opposite conditions, while responses significantly increase with the number of moving objects for both (Model and Human: p<0.01). In the TEXTURE DISPLACEMENT display, DorsalNet, like humans, shows significant differences when texture motion is coherent (Same > Opposite; Model and Human: p<0.01), but not when motion is incoherent due to small or large displacements. Critically, in the BLURRED BOUNDARIES display, blurring object boundaries reduces the response to 3rd-order motion, increasing the difference between the Same and Opposite conditions in humans (p<0.01), but not in the model. These results demonstrate that DorsalNet has developed a 1st-order motion energy mechanism, which can capture some human heading responses, but not those due to 3rd-order feature-tracking.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×