September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
A Performance-Optimized Limb Detection Model Selectively Predicts Behavioral Responses Based on Movement Similarity
Author Affiliations
  • Xilin Zhou
    Harvard University
  • Julian De Freitas
    Harvard University
  • Leyla Tarhan
    Harvard University
Journal of Vision September 2021, Vol.21, 2916. doi:https://doi.org/10.1167/jov.21.9.2916
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Xilin Zhou, Julian De Freitas, Leyla Tarhan; A Performance-Optimized Limb Detection Model Selectively Predicts Behavioral Responses Based on Movement Similarity. Journal of Vision 2021;21(9):2916. https://doi.org/10.1167/jov.21.9.2916.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Humans are expert body pose readers. What psychological representations support this feat, and how are they computed? We ask whether a computational model that has been optimized to localize bodily limbs learns representations that provide a good account of human psychological representations of kinematics. The heart of the model, called Openpose (Cao et al., 2019), is a dual-stream architecture that simultaneously detects body parts (in one stream) and associates them into limbs (in another) using part affinity fields––2D vector fields that encode the location and orientation of the limbs. To measure human perceptions of body kinematics, we asked 20 participants to arrange videos of 60 everyday actions based on the similarity of their movements. Then, we asked whether features from the intermediary layers of Openpose (55 layers, concatenated across the two streams) selectively recall the structure in this behavior. We found that representations from Openpose correlated with movement-guided similarity moderately (average r=0.18, p<.001). In addition, this correspondence improved across all layers of the network––especially early layers 1-15. In contrast, these features did not correlate well with human judgments of similarity in the actions’ intuitive, unguided similarity (r=0.07), visual appearance (r=0.05), or the actors’ goals (r=0.07). Since the early spine of Openpose is an object-recognition network (VGG-19) that has been fine-tuned for pose recognition, we also compared its performance to that of a generic VGG-19 not fine-tuned this way. Features from the generic network showed similar performance patterns to the fine-tuned version for the non-movement tasks, but an opposite pattern for the movement task, with performance dropping across layers 1-15. These results suggest that perceived movement similarity recruits computations and representations more specialized for limb detection than generic object recognition; and, more broadly, that performance-optimized computational pose models provide a useful tool for illuminating body perception.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×