Abstract
Many animals, including flies, macaques, and humans, have an ability to visually recognize image motion not only from shifts of spatial patterns defined by luminance modulations (first-order motion) but also from those defined by high-level image features such as temporal modulations and contrast modulations (second-order motion). In the past, second-order motion perception has been extensively studied using carefully designed artificial stimuli (e.g., drift-balanced motion) to control first-order motion components, but why and how the visual system has acquired this perceptual ability in natural environments remains poorly understood. We hypothesized that the biological system might naturally learn second-order motion perception for the purpose of estimating correct physical object motion amidst internal optical fluctuations produced, for example, by highlights of glossy materials and refractions of transparent materials. As a proof concept, we developed a DNN-based model to process both first- and second-order motions in natural scenes. The model was based on our two-stage model (Sun et al., NeurIPS 2023) consisting of a trainable motion energy sensing and a recurrent self-attention network, each inspired by biological computations in V1 and MT. For preprocessing for complex second-order features, we added a second input pathway with a vanilla multi-layered convolution network. The model was trained on two distinct optical flow datasets generated by rendering random object motion: one with purely diffuse reflection (PD) and the other with non-diffuse (ND) material properties, the latter including ample optical turbulence made by specular reflections and transparent refractions. The ND-trained model demonstrated significantly better recognition of various types of second-order motion, aligning closely with human performance measured in our psychophysical experiments. Also, this performance was unachievable without the second input pathway. The results suggest that second-order motion perception might have evolved, at least partially, to help robust estimation of object motion while countering optical fluctuations under natural environments.