September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Can two-stream convolutional neural networks emulate human perception of biological movements?
Author Affiliations & Notes
  • Hannah Lee
    Department of Psychology, University of California, Los Angeles
  • Yujia Peng
    Department of Psychology, University of California, Los Angeles
  • Tianmin Shu
    Department of Statistics, University of California, Los Angeles
  • Hongjing Lu
    Department of Psychology, University of California, Los Angeles
    Department of Statistics, University of California, Los Angeles
Journal of Vision September 2019, Vol.19, 192a. doi:https://doi.org/10.1167/19.10.192a
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Hannah Lee, Yujia Peng, Tianmin Shu, Hongjing Lu; Can two-stream convolutional neural networks emulate human perception of biological movements?. Journal of Vision 2019;19(10):192a. https://doi.org/10.1167/19.10.192a.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Visual recognition of biological motion recruits form and motion processing supported by both dorsal and ventral pathways. This neural architecture is resembled by the two-stream convolutional neural networks (CNNs) (Simonyan & Zisserman, 2014), which include a spatial CNN to process static image frames, a temporal CNN to process motion flow fields, and a fusion network with fully-connected layers to integrate recognition decisions. The two-stream CNNs showed human-like performance for recognizing actions from natural videos. We tested whether the two-stream CNNs could account for several classic findings in the literature of biological motion perception. In simulation 1, we trained the model with actions in skeletal displays and tested its generalization with actions in point-light (PL) displays. We found that only the temporal CNN can recognize PL actions with high performance. Simulation 2 examined whether the two-stream CNNs could discriminate the form-only PL walker in which local motion was eliminated by randomly sampling points along the limbs in each frame (Beintema & Lappe, 2002). The image stream and the fusion network, but not the motion pathway, showed similar activity pattern as for intact PL walkers, indicating that the model was able to utilize appearance information such as body structure to detect human motion. In simulation 3, we tested whether the two-stream CNNs showed inversion effects (better performance for upright actions than for upside-down actions) across a range of conditions (Troje & Westhoff, 2006). The model showed the inversion effect in most conditions, but not in the spatial-scramble condition. This failure suggests that the feed-forward connections in the two-stream CNN model limit its ability to serve as a “life detector.” The model would require additional long-range connections in its architecture in order to pass special local movements (e.g., foot movements in walking) to later layers for efficient detection and recognition.

Acknowledgement: NSF grant BCS-1655300 
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×