August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Modeling of Human Motion Perception Mechanism: A Simulation based on Deep Neural Network and Attention Transformer
Author Affiliations & Notes
  • Zitang sun
    Cognitive Informatics Lab, Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Japan
  • Yen-Ju Chen
    Cognitive Informatics Lab, Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Japan
  • Yung-Hao Yang
    Cognitive Informatics Lab, Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Japan
  • Shin’ya Nishida
    Cognitive Informatics Lab, Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Japan
    Human Information Science Laboratory, NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Japan
  • Footnotes
    Acknowledgements  This work is supported by JST JPMJFS2123, MEXT/JSPS JP20H00603 and JP20H05605.
Journal of Vision August 2023, Vol.23, 4894. doi:https://doi.org/10.1167/jov.23.9.4894
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Zitang sun, Yen-Ju Chen, Yung-Hao Yang, Shin’ya Nishida; Modeling of Human Motion Perception Mechanism: A Simulation based on Deep Neural Network and Attention Transformer. Journal of Vision 2023;23(9):4894. https://doi.org/10.1167/jov.23.9.4894.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The motion perception system has been widely explored as an essential function for creatures to perceive and interact with the world. Nevertheless, the current motion models based on the V1-MT structure struggle to derive informative motion flows from local motion energy in complex natural scenes. This study takes advantage of the flexibility of deep neural networks (DNNs) to simulate the motion perception process. To tackle this challenging problem while trading off neurophysiological plausibility, we developed a two-stage process that merged the classical motion energy constraint with a well-developed transformer structure. The first stage consists of a group of cells with trainable spatiotemporal frequencies tunning to capture different preferences of local motion energy. The subsequent stage implemented the attention mechanism from deep learning. It enables recurrent integration of the local motion signals via pairwise correlation across each spatial location to resolve the aperture problem. The supervised training was applied to fit the motion ground truth in digital videos, which consist of a sizeable multi-frame training set, including the self-designed non-texture dataset, MPI-Sintel dataset, natural image sequences with pseudo-labels, etc. Based on drifting gratings and plaids, we used virtual neurophysiology to measure model's activation and found some similar representations between humans' V1-MT and the model's two processing stages. The model generalizes well from non-texture stimuli used in psychophysics to complex natural scenes, demonstrating similarity to human interpretation. In more complex scenarios, the model demonstrates a high-level function of integrating local energy to infer global motion flow. In summary, the proposed model could finely replicate the performance from simple drifting Gabor to the complex natural scene and introduce some characteristics similar to the known human visual system's properties.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×