Abstract
Object recognition in natural scenes occurs extremely rapidly, suggesting that it requires only feed-forward processing (Serre et al., 2007). However, recurrent processing is involved in many operations that are relevant for object recognition, such as perceptual grouping and figure-ground segmentation (Lamme, 1995; Roelfsema et al., 2000), and disruption of feedback impairs object recognition performance (Koivisto et al., 2011). Interestingly, the effects of feedback disruption are strongest when the object is ‘less easily segregated’ from the background (as assessed by post-hoc ratings; Koivisto et al., 2013). This suggests that the degree of feedback may depend on overall scene complexity. Here, we tested this hypothesis by manipulating scene complexity based on two global image statistics: contrast energy (CE) and spatial coherence (SC). These statistics respectively summarize local edge intensity and higher-order correlations between edges. Together, they are predictive of the degree of inherent figure-ground segregation in a scene. In particular, low CE/SC scenes are sparse, containing single objects against simple backgrounds, whereas high CE/SC scenes are cluttered or textured. Subjects performed an animal vs. non-animal categorization task in the fMRI scanner for scenes with low, medium or high CE/SC values. Slowed reaction times and increased error rates indicated that categorization was especially difficult for high CE/SC scenes (high clutter). Accordingly, in early visual cortex (V1), only high clutter animal scenes gave rise to increased fMRI activity relative to non-animal scenes. Separate ERP recordings showed that the increased V1 activity was not driven by feed-forward differences, but instead reflected feedback activity from ~200 ms onwards. These results suggest that during object recognition in natural scenes, recurrent activity depends on scene complexity determined from an initial, global representation described by CE and SC. For sparse scenes with clear figure-ground segregation, this feed-forward representation may be sufficient, whereas for cluttered scenes, object recognition involves feedback.
Meeting abstract presented at VSS 2015