September 2011
Volume 11, Issue 11
Vision Sciences Society Annual Meeting Abstract  |   September 2011
Poselets: A distributed representation for visual recognition
Author Affiliations
  • Lubomir Bourdev
    EECS, U.C. Berkeley, USA
    Adobe Systems, Inc., USA
  • Subhransu Maji
    EECS, U.C. Berkeley, USA
  • Jitendra Malik
    EECS, U.C. Berkeley, USA
Journal of Vision September 2011, Vol.11, 891. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Lubomir Bourdev, Subhransu Maji, Jitendra Malik; Poselets: A distributed representation for visual recognition. Journal of Vision 2011;11(11):891. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Detection and pose estimation of people in images are challenging tasks due to variations in articulation, viewpoint and appearance. Part detectors are a natural way to attack this problem, but identifying good parts remains an open question. Anatomical parts, such as arms and legs, are difficult to detect reliably because parallel lines are common in natural images. In contrast, a visual conjunction such as “half of a frontal face and a left shoulder” may be a perfectly good discriminative visual pattern. Bourdev and Malik [ICCV 2009] introduced new parts, called poselets, which correspond to such discriminative visual patterns. There is a wide variety of poselets – a frontal face, a profile face, a head-and-shoulder configuration, etc. We discover them by choosing a random seed patch from the image of a random person in the training set and finding the “corresponding” patches in images of other people. A corresponding patch is defined as one that has the same spatial configuration of semantic keypoints (such as joints, eyes, nose) as the seed patch. We discriminatively train detectors for these patches. To find people in a test image, we evaluate the poselet detectors at multiple locations and scales and cluster the activations into person hypotheses. The activations within each cluster form a distributed representation of the pose of a person and provide the basis for numerous high-level vision tasks. Our system is the current best performer on the task of people detection and segmentation. We are able to infer attributes (the gender, style of hair, clothes, presence of glasses, hat, etc.) and actions (phoning, running, walking, reading a book, etc.) of people under arbitrary viewpoints and articulations. These ideas extend naturally to other visual categories. Interestingly, receptive fields of neurons in inferotemporal cortex have a variety consistent with that predicted by our model.

Adobe Systems, MURI N00014-06-1-0734, Google. 

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.