Our results demonstrate a significant and reliable pattern of fMRI activity associated with perceptual grouping: when moving line segments were perceived as a single, translating object, activity increased in the LOC and decreased in V1 compared to when the same line segments were perceived as ungrouped. The LOC activity pattern is expected because this cortical region is known to be shape-selective (e.g., Kourtzi & Kanwisher,
2001). The V1 activity pattern is consistent with our earlier finding (Murray et al.,
2002). Taken together, these results suggest that feedback from higher visual areas serves to reduce activity in earlier visual areas during perceptual grouping.
Although our earlier study (Murray et al.,
2002) included a condition using a similar bistable “translating diamond,” the current study represents a significant advance in methodology and analysis. Here we used an independently defined, retinotopically specific localizer for V1. Thus, we are confident that the modulations in the fMRI signal that we observed occurred in the retinotopic representation of the stimulus and not in immediately adjacent retinotopic regions (e.g., artifacts due to “blood-flow steal”). In addition, due to limited slice-selection, our previous report using the translating diamond only made measurements in V1. Here, using an independent localizer for the LOC, we show significant changes that inversely reflect the pattern of activity observed in V1. Finally, the current study employs event-related averaging that characterizes the temporal dynamics and the magnitudes of V1 and LOC changes in more detail than our previous report.
In addition to the LOC and V1, we analyzed the fMRI signal in V2, V3, and TOA, none of which showed the kinds of signal changes observed in the LOC and V1. The small modulation observed in TOA likely has a straightforward explanation—this region is considered to be at a relatively high level in the visual hierarchy and simple geometric shapes (e.g., the diamond) are unlikely to evoke much activity in this region. While V2 had a similar pattern of activity as V1, its amplitude was significantly reduced. V3 essentially had no change in signal in response to perceptual transitions. These observations are important as they point to a potentially unique computational role for V1 in perceptual grouping.
Given the convincing empirical demonstration of inverse activity patterns in V1 and the LOC, the current findings raise important theoretical questions centered on the interpretation of the decreases in the fMRI signal in V1 when the line segments were perceptually grouped. First, what implications does the measurement technique have on the interpretation? As is well known, changes in the fMRI signal represent multiple hemodynamic processes related to multiple underlying physiological causes (Logothetis & Wandell,
2004). Although strong correlations between the fMRI response and the neural spike rate have been reported (e.g., Logothetis, Pauls, Augath, Trinath, & Oeltermann,
2001; Rees, Friston, & Koch,
2000), our observation of a reduced fMRI signal in V1 may be a manifestation of subthreshold and/or inhibitory processes in addition to a reduction in spiking activity. Other, more direct, techniques are required to resolve this question.
Second, do the anti-correlations between the LOC and V1 necessarily mean a direct interaction between the two areas? More specifically, are the reductions observed in V1 necessarily caused by feedback from the LOC? Although correlations do not necessarily imply a causal relationship, if the LOC is maintaining a representation of the grouped elements and if the changes in V1 are due to changes in perception, it would imply at least an indirect relationship between the two regions. However, the only conclusive way to answer this question is to selectively remove feedback connections to V1.
Third, and the perhaps most difficult question, are the reductions in V1 necessary for the perception of the diamond? Although a strong argument could be made that the modulations observed in the LOC—a region well known for shape perception—underlie the change in perception, making a similar argument for V1 is more difficult. V1 has traditionally been thought to maintain a veridical representation of retinal information. Consequently, a stimulus that has physically constant features—as with the translating diamond—is not generally expected to change V1 activity. We consider several alternative accounts of the potential functional significance of the V1 signal changes.
On one end of the spectrum of possibilities, the changes in V1 might not be functionally significant. For example, fMRI measurements of V1 have shown reliable signal changes associated with spatial attention. Is it possible that the changes we observed simply reflect incidental shifts in spatial attention that occur during perceptual transitions? This explanation would require that subjects directed their spatial attention away from the line segments when they perceived the diamond, relative to the non-diamond condition. There is no reason to believe that these shifts occurred. In fact, our subjects claimed that they needed to focus their attention on the line segments in order to perceive the diamond. However, future studies that explicitly manipulate spatial attention and its effect on perceptual grouping and the fMRI signal are warranted.
Along similar lines, the argument could be made that the differences in V1 and LOC activity might simply reflect attention to the features (“diamond” vs. “ungrouped line segments”) that result from the different perceptual states. For example, when subjects perceived ungrouped line segments they might have attended to this feature of the stimulus, consequently leading to more activity in V1 because it is presumably specialized for processing this feature. In contrast, when subjects perceived the diamond they might have attended to its overall shape leading to more activity in the LOC because of its specialization in shape processing. On one hand, attention to features is part of the process. During the perception of the diamond, subjects are certainly “attending to the diamond-ness” and separating the role of attention—which is directly tied to perceptual awareness—would be very difficult in our experimental setup. However, there is empirical evidence which renders a simple feature-based attention explanation unlikely. First, we observed notably diminished (V2) and abolished (V3) modulation of the fMRI signal in other early visual areas. There is no
a priori reason to believe that these areas are any less specialized for the features of the “non-diamond” than V1. Second, Buracas, Fine, and Boynton (
2005) compared fMRI responses in early visual cortex as subjects switched attention between different features (contrast vs. speed) of a moving grating. They found no modulation of the fMRI signal in any early visual area (V1, V2, V3, and MT) as a function of feature-based attention when, in theory, it might be expected. For example, early visual cortex is highly sensitive to contrast but attending to that feature did not modulate the fMRI signal. However, given the differences in underlying features in the Buracas et al. study (contrast and speed) compared to our study (grouping of line segments) to fully address the potential contribution of feature-based attention will require future direct empirical tests. Such an experiment might alternate attention between local versus global elements of simple shapes (such as the diamond) and measure activity in both lower and higher visual areas.
An alternative interpretation of the decrease in V1 activity is that it might not have a direct functional significance but reveal a general metabolic efficiency constraint placed on neural processing. Spiking activity is metabolically expensive (Lennie,
2003) and there may be a general strategy to minimize neural activity whenever possible. For example, if one cortical area can represent the visual stimulus, another area should not. In our case, when the line segments form a representation that can be maintained in the LOC, V1 may participate less in the representation simply to minimize overall activity. Although sparseness constraints have been shown to have important theoretical implications related to the emergence of receptive field properties
within a cortical area (Olshausen & Field,
1996), the implications of extending this principle to
between areas are less clear.
Finally, the reductions in V1 activity observed during perceptual grouping may reveal important functional mechanisms of visual information processing. One such mechanism, mentioned in the
Introduction section, is predictive coding (Mumford,
1992; Rao & Ballard,
1999). Predictive coding models posit that higher areas are actively attempting to “explain” activity patterns in lower areas via feedback projections. Because most predictive coding models include a subtractive comparison between the hypotheses formed in higher areas and the incoming sensory input represented in lower areas, the overall effect of feedback may be to reduce activity in lower areas. Specifically, reduced activity in lower visual areas would occur whenever the predictions of higher-level areas match incoming sensory information. In the case of the translating diamond, when the LOC maintains a representation of a grouped shape, this “expectation” or “understanding” of the image features is sent back to V1 and removed, resulting in less activity. When the LOC is unable to form such an understanding (i.e., when they are perceived as ungrouped), these feedback processes are not occurring and there is consequently more activity in V1.
In summary, although our results are consistent with a number of theoretical interpretations, they demonstrate that perceptual grouping involves activity modulations at multiple stages of the visual hierarchy. The two areas considered in detail here—the LOC and the V1—correspond to areas that are known to represent global shape and local visual features, respectively. Importantly, the activity patterns in these areas are inversely related and suggest that perceptual grouping involves both increases and decreases in activity in the human visual system.