Abstract
Humans are expert body pose readers. What psychological representations support this feat, and how are they computed? We ask whether a computational model that has been optimized to localize bodily limbs learns representations that provide a good account of human psychological representations of kinematics. The heart of the model, called Openpose (Cao et al., 2019), is a dual-stream architecture that simultaneously detects body parts (in one stream) and associates them into limbs (in another) using part affinity fields––2D vector fields that encode the location and orientation of the limbs. To measure human perceptions of body kinematics, we asked 20 participants to arrange videos of 60 everyday actions based on the similarity of their movements. Then, we asked whether features from the intermediary layers of Openpose (55 layers, concatenated across the two streams) selectively recall the structure in this behavior. We found that representations from Openpose correlated with movement-guided similarity moderately (average r=0.18, p<.001). In addition, this correspondence improved across all layers of the network––especially early layers 1-15. In contrast, these features did not correlate well with human judgments of similarity in the actions’ intuitive, unguided similarity (r=0.07), visual appearance (r=0.05), or the actors’ goals (r=0.07). Since the early spine of Openpose is an object-recognition network (VGG-19) that has been fine-tuned for pose recognition, we also compared its performance to that of a generic VGG-19 not fine-tuned this way. Features from the generic network showed similar performance patterns to the fine-tuned version for the non-movement tasks, but an opposite pattern for the movement task, with performance dropping across layers 1-15. These results suggest that perceived movement similarity recruits computations and representations more specialized for limb detection than generic object recognition; and, more broadly, that performance-optimized computational pose models provide a useful tool for illuminating body perception.