Abstract
Previous work has shown that event representations are constructed in a coarse-to-fine fashion (Larson, Hendry, & Loschky, 2012). Specifically, an event’s superordinate scene category was recognized first (Indoor or Outdoor), followed by its basic level scene category (Kitchen or Office), and then its basic level action category (Cooking or Washing). This indicates that event representations begin with understanding the scene category. If this is correct, then it suggests that recognizing the scene’s category may facilitate action recognition. Conversely, the surrounding scene image may interfere with processing the action, as suggested by Davenport and Potter (2004), who found that object recognition was better with a blank background than with a scene. Lastly, there may be no effect of the scene background on action recognition, suggesting that the two processes are functionally isolated. Participants categorized actions in one of three image conditions: within 1) their original scene background, 2) a gray background, or 3) on a texture background (Portilla & Simoncelli, 2000) generated from the original image. Scene image processing time was manipulated by varying the target-mask SOA from 24 ms-660 ms (i.e., roughly two fixations). Participants then made a Yes/No response to a valid or invalid action category post-cue. The results showed no difference between the original scene and gray background conditions for SOAs <50 ms, but both were significantly better than the texture background condition. However, for SOAs > 50 ms, the original scene background condition was significantly better than the gray background and texture background conditions. These results indicate that semantic scene information does not facilitate action recognition at the very earliest stages of processing, but does so once gist recognition reaches a threshold (the basic level scene gist inflection point generally around 50 ms SOA). Conversely, unrecognizable visual structure interferes with action categorization for roughly 330 ms.
Meeting abstract presented at VSS 2013