September 2015
Volume 15, Issue 12
Free
Vision Sciences Society Annual Meeting Abstract  |   September 2015
Invariant representations for action recognition in the visual system
Author Affiliations
  • Andrea Tacchetti
    Center for Brains, Minds and Machines, Massachusetts Institute of Technology These authors contributed equally to this work
  • Leyla Isik
    Center for Brains, Minds and Machines, Massachusetts Institute of Technology These authors contributed equally to this work
  • Tomaso Poggio
    Center for Brains, Minds and Machines, Massachusetts Institute of Technology
Journal of Vision September 2015, Vol.15, 558. doi:https://doi.org/10.1167/15.12.558
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Andrea Tacchetti, Leyla Isik, Tomaso Poggio; Invariant representations for action recognition in the visual system. Journal of Vision 2015;15(12):558. https://doi.org/10.1167/15.12.558.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The human brain can rapidly parse a constant stream of visual input. The majority of visual neuroscience studies, however, focus on responses to static, still-frame images. Here we use magnetoencephalography (MEG) decoding and a computational model to study invariant action recognition in videos. We created a well-controlled, naturalistic dataset to study action recognition across different views and actors. We find that, like objects, actions can also be read out from MEG data in under 200 ms (after the subject has viewed only 5 frames of video). Action can also be decoded across actor and viewpoint, showing that this early representation is invariant. Finally, we developed an extension of the HMAX model, inspired by Hubel and Wiesel’s findings of simple and complex cells in primary visual cortex as well as a recent computational theory of the feedforward invariant systems, which is traditionally used to perform size- and position-invariant object recognition in images, to recognize actions. We show that instantiations of this model class can also perform recognition in natural videos that are robust to non-affine transformations. Specifically, view-invariant action recognition and action invariant actor identification in the model can be achieved by pooling across views or actions, in the same manner and model layer as affine transformations (size and position) in traditional HMAX. Together these results provide a temporal map of the first few hundred milliseconds of human action recognition as well as a mechanistic explanation of the computations underlying invariant visual recognition.

Meeting abstract presented at VSS 2015

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×