August 2012
Volume 12, Issue 9
Vision Sciences Society Annual Meeting Abstract  |   August 2012
Recognizing activities and poses: lessons from computer vision
Author Affiliations
  • Lavanya Sharan
    Disney Research, Pittsburgh, USA
  • Leonid Sigal
    Disney Research, Pittsburgh, USA
  • Jessica Hodgins
    Disney Research, Pittsburgh, USA\nCarnegie Mellon University, USA
Journal of Vision August 2012, Vol.12, 645. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Lavanya Sharan, Leonid Sigal, Jessica Hodgins; Recognizing activities and poses: lessons from computer vision. Journal of Vision 2012;12(9):645.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

There has been extensive research aimed at understanding how we perceive human bodies and human movements. A number of methodologies have been employed for this research: psychophysics, neuroimaging, and computational modeling. However, most studies, especially psychophysical and computational studies, have only considered the case of point light animations (Johansson, 1973). Meanwhile, in the field of computer vision, there has been a dedicated effort to identify human figures in real world images and videos. Computer vision models that operate on real world visual inputs can be informative for perceptual investigations. We were inspired by the observation that successful computer vision models of action and pose recognition perform qualitatively different computations - one driven by holistic features, and the other driven by deformable templates based on the human form. We wanted to understand if this difference in computational strategy for recognizing actions vs. poses reflected differences in human perceptual processing as well. We collected photographs of human figures performing one of six common activities (e.g., bending, kicking, etc.) and for each activity, included a variety of poses, backgrounds, lighting, body types and clothing. We annotated the pose in each photograph (Bourdev & Malik, 2009) to create 3D stick figure representations. Observers were then asked to recognize either the activity or the pose depicted in the photographs by selecting one of six options (e.g., list of activities, or set of stick figures). We found that action recognition was faster and more accurate than pose discrimination. Standard actions like walking or sitting can be predicted even when the human figure is occluded, which suggests the importance of real world cues like support surfaces and scene context for recognizing actions. Taken together, our findings support the use of divergent strategies in computer vision for recognizing actions (fast, context-driven) and poses (slower, template-based) in real world images.

Meeting abstract presented at VSS 2012


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.