Free
Article  |   July 2012
Perceptual alternations between unbound moving contours and bound shape motion engage a ventral/dorsal interplay
Author Affiliations
  • Anne Caclin
    Université Pierre et Marie Curie, Centre de Recherche de l'Institut du Cerveau et de la Moelle épinière, UMRS975, Paris
    Inserm, U975, Paris
    CNRS, UMR7225, Paris
    Inserm, U1028, Lyon Neuroscience Research Center, Brain Dynamics and Cognition team, Lyon, France
    CNRS, UMR5292, Lyon Neuroscience Research Center, Brain Dynamics and Cognition team, Lyon, France
    University Claude Bernard Lyon 1, Lyon, France
    [email protected]http://u821.lyon.inserm.fr/
  • Anne-Lise Paradis
    Université Pierre et Marie Curie, Centre de Recherche de l'Institut du Cerveau et de la Moelle épinière, UMRS975, Paris
    Inserm, U975, Paris
    CNRS, UMR7225, Paris
    [email protected]http://cogimage.dsi.cnrs.fr/
  • Cédric Lamirel
    Université Pierre et Marie Curie, Centre de Recherche de l'Institut du Cerveau et de la Moelle épinière, UMRS975, Paris
    Inserm, U975, Paris
    CNRS, UMR7225, Paris
    Ophthalmology Department, Fondation Ophtalmologique Adolphe de Rothschild and Hôpital Bichat–Claude Bernard, Paris
    [email protected]http://www.fo-rothschild.fr/
  • Bertrand Thirion
    NeuroSpin, I2BM, CEA, F-91191, Gif-sur-Yvette Cedex, France
    Parietal team, Institut National de Recherche en Informatique et en Automatique Saclay-Ile-de-France, F-91191, Gif-sur-Yvette, France
    [email protected]http://parietal.saclay.inria.fr/
  • Eric Artiges
    DRM-CEA-DSV, Service Hospitalier Frédéric Joliot, I2BM, F-91401, Orsay Cedex, France
    Inserm, U797, Neuroimaging and Psychiatry Research Unit, IFR49, Orsay, France
    University Paris-Sud and University Paris Descartes, UMRU797, Paris
    [email protected]http://www.u1000.idf.inserm.fr/
  • Jean-Baptiste Poline
    NeuroSpin, I2BM, CEA, F-91191, Gif-sur-Yvette Cedex, France
    Parietal team, Institut National de Recherche en Informatique et en Automatique Saclay-Ile-de-France, F-91191, Gif-sur-Yvette, France
    [email protected]http://parietal.saclay.inria.fr/
  • Jean Lorenceau
    Université Pierre et Marie Curie, Centre de Recherche de l'Institut du Cerveau et de la Moelle épinière, UMRS975, Paris
    Inserm, U975, Paris
    CNRS, UMR7225, Paris
    [email protected]http://cogimage.dsi.cnrs.fr/
Journal of Vision July 2012, Vol.12, 11. doi:https://doi.org/10.1167/12.7.11
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Anne Caclin, Anne-Lise Paradis, Cédric Lamirel, Bertrand Thirion, Eric Artiges, Jean-Baptiste Poline, Jean Lorenceau; Perceptual alternations between unbound moving contours and bound shape motion engage a ventral/dorsal interplay. Journal of Vision 2012;12(7):11. https://doi.org/10.1167/12.7.11.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract
Abstract
Abstract:

Abstract  Visual shape and motion information, processed in distinct brain regions, should be combined to elicit a unitary coherent percept of an object in motion. In an fMRI study, we identified brain regions underlying the perceptual binding of motion and shape independently of the features—contrast, motion, and shape—used to design the moving displays. These displays alternately elicited a bound (moving diamond) or an unbound (disconnected moving segments) percept, and were either physically unchanging yet perceptually bistable or physically changing over time. The joint analysis of the blood-oxygen-level-dependent (BOLD) signals recorded during bound or unbound perception with these different stimuli revealed a network comprising the occipital lobe and ventral and dorsal visual regions. Bound percepts correlated with in-phase BOLD increases within the occipital lobe and a ventral area and decreased activity in a dorsal area, while unbound percepts elicited moderate BOLD modulations in these regions. This network was similarly activated by bistable unchanging displays and by displays periodically changing over time. The uncovered interplay between the two regions is proposed to reflect a generic binding process that dynamically weights the perceptual evidence supporting the different shape and motion interpretations according to the reliability of the neural activity in these regions.

Introduction
In order to cope with their visual environment, living organisms must recover the shape and motion of objects from the retinal image and correctly parse these objects into independent perceptual entities. Recovering the shape is thought to begin in the primary visual cortex and to develop within the ventral pathway. 
The lateral occipital complex (LOC), a wide cortical zone including areas more activated by pictures of objects than by their scrambled counterparts (Malach et al., 1995; Grill-Spector et al., 1999; Kourtzi & Kanwisher, 2000a; Murray, Olshausen, & Woods, 2003), is assumed to play a pivotal role in integrating elementary contours into a whole shape. More specifically, a subregion of the LOC located in the posterior fusiform (pFs) has been found to respond reliably to occluded fragmented objects and collinear discontinuous contours (Lerner, Hendler, Ben-Bashat, Harel, & Malach, 2001; Lerner, Hendler, & Malach, 2002; Kourtzi, Tolius, Altmann, Aigath, & Logothetis, 2003; Altmann, Bülthoff, & Kourtzi, 2003; but see Hayworth & Biederman, 2006), suggesting it is involved in contour completion and assessment of border ownership, and may be involved in the correct parsing of the retinal image into distinct shapes in conditions of interposition (Rosenblatt, 1961). 
Conversely, motion perception is known to involve visual areas distributed along the dorsal pathway. Among those, the hMT+ complex appears as a good candidate for motion integration (Culham, He, Dukelow, & Verstraten, 2001; Murray et al., 2003; Orban, 2008; Rees, Friston, & Koch, 2000; Huk, Dougherty, & Heeger, 2002), and a number of studies find that coherent motion evokes greater activity than incoherent motion in this region (Movshon, Adelson, Gizzi, & Newsome, 1985; Castelo-Branco et al., 2002; Born & Bradley, 2005; Tsui, Hunter, Born, & Pack, 2010; but see McKeefry, Watson, Frackowiak, Fong, & Zeki, 1997; Majaj, Carandini, & Movshon, 2007). 
It was proposed a few decades ago that the ventral and dorsal pathways processed form and motion independently (Goodale & Milner, 1992; Ungerleider & Mishkin, 1982). This sharp dichotomy has been revisited since then, and numerous fMRI studies now suggest it is an oversimplified view (Braddick, O'Brien, Wattam-Bell, Atkinson, & Turner, 2000; Könen & Kastner, 2008; Milner & Goodale, 2008; Denys et al., 2004; Zhuo et al., 2003). In addition, a number of visual phenomena call for neural interactions between both pathways. 
Motion can minimize ambiguities related to form, as with “structure-from-motion” stimuli (Wallach & O'Connell, 1953; Johansson, 1973; Paradis et al., 2000) or with kinetic contours defined by motion contrast (Marcar, Xiao, Raiguel, Maes, & Orban, 1995). Conversely, shape can exert a control on motion integration (Lorenceau, 1996; Lorenceau & Alais, 2001), or prior knowledge of body posture can constrain the perception of apparent motion (Shiffrar & Freyd, 1990). At the extreme, static images can provide “implied” motion information, as with the action lines found in comic books (Cutting, 2002) or with single static snapshots of a moving scene (Krekelberg, Vatakis, & Kourtzi, 2005; Kourtzi & Kanwisher, 2000b). The finding that static photographs with such implied motion elicit BOLD activity in hMT+ not only indicates communication between ventral and dorsal pathways, but also suggests that both pathways can interact via inference processes (Kourtzi & Kanwisher, 2000b). 
Inference processes are referred to as Bayesian inference, predictive coding, or inverse hierarchical dynamical models (Mumford, 1992; Rao & Ballard, 1999; Kersten & Yuille, 2003; Friston, 2005; Hochstein & Ahissar, 2002; Jehee & Ballard, 2009; Friston & Kiebel, 2009). The idea is that high-level areas within the cortical hierarchy elaborate a perceptual prediction from partial sensory evidence; this prediction is sent through feedback connections to lower-order areas, which in turn signal the error between this prediction and their incoming inputs to higher areas. Accordingly, if a high-level area elaborates an accurate prediction, the error signal decreases, inducing reduced activity in the corresponding lower-level area and enhanced activity in the predicting area (e.g., Friston & Kiebel, 2009). 
Recently, Murray, Kersten, Olshausen, Schrater, and Woods (2002) and Fang, Kersten, and Murray (2008) observed opposite modulations of BOLD activity in the LOC and V1 during the perceptual binding of moving elements and proposed a predictive coding model of vision to interpret this pattern: Shape inferences from the LOC reduce the activity in primary visual cortex (V1). 
Although appealing, this interpretation leaves a number of pending questions. First, the same predictive coding strategy should simultaneously apply to all hierarchically organized areas, such that one area, V1 for instance, should receive predictive signals from higher-order areas of the ventral and the dorsal pathways, but also send predictive signals to lower-order areas such as the thalamus (Wang, Jones, Andolina, Salt, & Sillito, 2006; Sillito, Cudeiro, & Jones, 2006; Jehee & Ballard, 2009). Although the error and predictive signals may be elaborated in different cortical layers (Mumford, 1992; Jehee & Ballard, 2009; Friston & Kiebel, 2009), whether the BOLD activity measured with fMRI can reliably disentangle these signals remains an open question. Second, in the studies previously discussed, the predictive coding model does not take into account possible activity of the dorsal motion pathway. Such activity would however be expected given the changes in perceived motion coherence. Third, would the results and the interpretation be the same with other types of stimuli? It is not excluded that the intriguing data of Murray et al. (2002) and Fang et al. (2008) are specific to bistable displays and not to motion binding per se. Indeed, predictive coding is particularly desirable whenever sensory inputs are ambiguous, as it is the case with a physically unchanging bistable stimulus. However, bistable stimuli may also engage attention and decision in a specific way or involve neural adaptation not necessarily related to the visual binding processes under investigation. Furthermore, BOLD activity may depend upon the physical characteristic of the bistable display, even if these different displays induce a similar motion percept (Sterzer, Eger, & Kleinschmidt, 2003). 
In the present whole-brain fMRI study, we sought to uncover the interplay of visual areas during the binding of contour motions into a single moving shape, while addressing different issues previously evoked. To this aim, we used displays entailing the perception of either a single diamond shape moving as a whole along a circular trajectory or as four bars translating up and down independently (see Movies 1 to 3). To compare the respective contributions of internally driven and externally driven binding processes to the BOLD response, we used both bistable unchanging displays, where perceptual alternations were endogenously triggered, and stimuli where perceptual alternations were exogenously induced by manipulating a visual dimension of the display. To disentangle the BOLD activation related to specific stimulus features from a more general binding process, we manipulated three different dimensions of the stimulus—contrast, shape, or motion. We therefore used six different displays in total, the responses to which could be compared to identify the brain regions selectively recruited by particular features or jointly analyzed to identify the cortical network concerned with a generic binding process. 
 
Movie 1
 
Movies 1, 2, 3 are examples of the displays used in the study. Caution: perceptual alternations and stimulus speed may depend on screen settings. Movie 1 shows the “Contrast” display as an example of spontaneous transitions where unchanging stimuli are alternately seen as bound or unbound. Note that the first episode lasts longer than subsequent perceptual episodes.
 
Movie 2
 
Movies 2 and 3 are examples of evoked alternations. Movie 2 shows the “Motion” display with smoothly varying motion jitter driving perceptual alternations between bound and unbound percepts.
 
Movie 3
 
Movie 3 shows the “Shape” display where smoothly changing the orientation of the segments from a diamond to a chevron entails perceptual alternations between bound and unbound percepts.
Methods
Participants
Thirteen volunteers (aged 20 to 35, one left-handed, nine women) participated in this study after giving their informed written consent and receiving financial compensation for their participation. The study was approved by the local ethics committee (CCPPRB Le Kremlin-Bicêtre, France). 
Stimuli
All displays comprised a small central fixation disk and four disconnected contours arranged in a diamond-like shape (Figure 1a) subtending a visual angle of about 5° at a viewing distance of 140 cm (screen resolution 1024 × 768 pixels, screen size 41 × 32 cm2). The segments oscillated at 1 Hz (sampling rate 60 Hz) along a vertical axis, and the motions of two adjacent segments were out of phase so that the overall motion was compatible with a diamond rotating along a circular trajectory (Movies 1, 2, and 3). Such stimuli can hence be seen either as line segments translating independently along a vertical axis (unbound percept, Figure 1a) or as a single rigid diamond moving as a whole along a circular trajectory (bound percept, Figure 1). 
Figure 1
 
Experimental conditions. (a) Bistable perception of motion displays (see Movie 1). Displays were alternately seen as unbound line segments translating up and down independently (left) or as bound line segments making up a rigid diamond translating along a circular path (right). Bistability was either generated endogenously (see e) by the brain or induced by physically modulating one of three visual parameters (as depicted in b, c, and d). (b) “Contrast” display. Global shape motion is seen when segments have low-contrast line ends and a high-contrast center (right), whereas reversing the contrast distribution entails the perception of independently moving segments (left). (c) “Motion” display (see Movie 2). The diamond shape is defined by alignments of high-luminance dots. Adding a motion jitter along the segment orientation to each dot (right) yields the perception of a rotating diamond shape, which breaks into independent contour translations for a small jitter or fixed intervals between neighboring dots (left).
 
(d) “Shape” display (see Movie 3). Contours defining a closed convex shape, as a diamond, are easily bound into a single rotating object, while contours defining an open concave shape, such as a chevron, are not. To ease perceptual alternations with this stimulus, static masks covering the vertices at all time during the movement were added. (e) Two induction modes. In half of the displays, perceptual transitions between the two percepts were exogenously evoked by slowly varying one of the parameters described in b, c, and d between two extreme values, as depicted by the red/green ellipse. In the other three displays, the value of the critical parameter (contrast, motion, or shape) was fixed and chosen on the basis of behavioral experiments to elicit spontaneous perceptual alternations between bound and unbound percepts, as depicted by the red/green bottom square.
Figure 1
 
Experimental conditions. (a) Bistable perception of motion displays (see Movie 1). Displays were alternately seen as unbound line segments translating up and down independently (left) or as bound line segments making up a rigid diamond translating along a circular path (right). Bistability was either generated endogenously (see e) by the brain or induced by physically modulating one of three visual parameters (as depicted in b, c, and d). (b) “Contrast” display. Global shape motion is seen when segments have low-contrast line ends and a high-contrast center (right), whereas reversing the contrast distribution entails the perception of independently moving segments (left). (c) “Motion” display (see Movie 2). The diamond shape is defined by alignments of high-luminance dots. Adding a motion jitter along the segment orientation to each dot (right) yields the perception of a rotating diamond shape, which breaks into independent contour translations for a small jitter or fixed intervals between neighboring dots (left).
 
(d) “Shape” display (see Movie 3). Contours defining a closed convex shape, as a diamond, are easily bound into a single rotating object, while contours defining an open concave shape, such as a chevron, are not. To ease perceptual alternations with this stimulus, static masks covering the vertices at all time during the movement were added. (e) Two induction modes. In half of the displays, perceptual transitions between the two percepts were exogenously evoked by slowly varying one of the parameters described in b, c, and d between two extreme values, as depicted by the red/green ellipse. In the other three displays, the value of the critical parameter (contrast, motion, or shape) was fixed and chosen on the basis of behavioral experiments to elicit spontaneous perceptual alternations between bound and unbound percepts, as depicted by the red/green bottom square.
Overall, six different displays were generated using a custom-made software (Jeda).In half of these displays, perceptual switches were evoked by slow periodic modifications of three independent stimulus parameters: the distribution of contrast along each line segment (“Contrast” display, Figure 1b); the motion jitter of the dots defining the sides of the diamond (“Motion” display, Figure 1c); or the shape defined by the line segments (“Shape” display, Figure 1d). In the Contrast display, when the segments have low-contrast line ends and a high-contrast center, spatiotemporal integration is favored and the bound percept dominates (Lorenceau & Shiffrar, 1992). In the Motion display, each dot inside the segments followed a sinusoidal high-frequency motion (8.6 Hz) along the segment orientation. The amplitude of this local motion jitter was identical for all dots (from zero to four pixels) but adjacent dots were moving out of phase. For small jitter amplitudes, each dot follows an unambiguous vertical motion path that favors the unbound percept. The bound percept dominates with large jitter amplitudes, which tend to blur the local motion direction (Lorenceau, 1996). In the Shape display, the arrangement of the segments alternated between a diamond- and chevron-like shape by slowly changing the orientation of the lower segments. While the diamond is seen as a single rotating shape, the chevron—which does not form a closed figure—is mostly perceived as unbound (Lorenceau & Alais, 2001). Note that line ends are occluded by static masks of a lighter grey than the background in the Shape display but not in the Contrast and Motion displays (Figure 1b, 1c, and 1d). In all cases, the attribute of interest varied smoothly and continuously with a cycle lasting 20 s, thus leading to a 10 s theoretical average duration for each perceptual episode. These three displays were designed on the basis of previous psychophysical studies (Lorenceau & Shiffrar, 1992; Lorenceau, 1996; Lorenceau & Alais, 2001), and extensive preliminary testing was conducted to ensure that the average durations of the bound and unbound percepts were similar for all displays. 
In the other three displays, the two perceptual states could alternate spontaneously in the absence of any physical stimulus change (bistable displays), reflecting endogenous modifications of the brain states (Figure 1e). To yield well-balanced perceptual alternations, settings for the bistable displays were chosen as the average of the transition points (contrast distribution, motion jitter amplitude, and segment orientation for the Contrast, Motion, and Shape displays, respectively) between the two percepts measured in preliminary experiments. 
Experimental design and procedure
Participants, lying in a supine position in the scanner, viewed the stimuli, rear-projected on a translucent screen at the back of the scanner, through a mirror attached to the antenna. The six experimental runs consisted of an initial fixation-only period lasting 23.5 s, one of the six displays presented for 200 s, and a fixation-only period of 16 s ended the run. Subjects were instructed to fixate the central disk for the entire course of the run and to report their percept—bound or unbound—by continuously pressing one of two response buttons held in each hand. Button presses were sampled at the video refresh rate (60 Hz). The assignment of buttons to bound and unbound perceptual states was balanced across subjects. Half of the subjects were presented clockwise rotating shapes and the other half counter-clockwise rotating shapes. The order of appearance of the six displays was balanced across subjects using a Latin square. 
After the six experimental runs, subjects were presented with two localizer sequences, intended to delineate visual object-related areas and motion-sensitive areas (see following), and three control blocks, aimed at identifying landmarks for the main analysis, with stimuli similar to the Contrast, Motion, and Shape displays, respectively. Each control block comprised six periods of 4 s of fixation, six periods of 10 s of a fully visible static diamond, and six periods of 10 s of its rotating version where the global shape rotation was always and unambiguously perceived, presented in a pseudorandom order. Subjects were instructed to press one response button whenever they saw the moving diamond and the other button for the static diamond, so as to ensure observers would behave as in the main experimental blocks. 
To localize object-related areas, we used blocks of grey-level images or line drawings of familiar or novel objects, and blocks of scrambled versions of these pictures, as described in detail in Kourtzi and Kanwisher (2000a, stimuli by courtesy of the authors). The localizer sequence for motion-sensitive areas consisted of 15 s of static random dot patterns (RDPs) alternating with 10 s of translating RDPs. The direction of the RDP changed randomly every 500 ms so as to cover eight directions separated by 45° steps during the presentation period. A fixation disk was provided to minimize eye movements. Five periods of static and translating RDPs were presented. 
The entire experiment lasted about one-and-a-half hours. Four subjects also underwent stimulation for retinotopic mapping in a separate session of acquisition (see Wotawa, Thirion, Castet, Anton, & Faugeras, 2005, for details about the procedure). 
Image acquisition
Echo-planar images were acquired with a 1.5 T whole-body Signa MRI scanner (General Electric, Milwaukee, WI) equipped with an eight-channel head coil, using a gradient-echo sequence sensitive to BOLD contrast (TR = 2.5 s, TE = 60 ms, flip angle = 30°). For each volume, 24 adjacent 4-mm thick slices covering the whole brain were acquired parallel to the AC-PC line (field-of-view = 240 × 240 mm2, 64 × 64 matrix, providing an in-plane spatial resolution of 3.75 × 3.75 mm2). 
For each subject, a high-resolution T1-weighted anatomical image was acquired prior to the functional volumes using a three-dimensional, inversion-recovery, prepared fast gradient echo (FSPGR) sequence (192 contiguous axial slices, 1.2-mm thickness, TR = 9.9 ms, TE = 2 ms, TI = 600 ms, flip angle = 10°, field of view = 240 × 240 mm2, 256 × 256 matrix, voxel size 0.9375 × 0.9375 × 1.2 mm3). 
fMRI analyses
Preprocessing
The first three images of each run, corresponding to the stabilization of the magnetic signal, were discarded. Preprocessing consisted of the following steps: correction for slice acquisition order, motion correction, normalization to the Montreal Neurological Institute template provided with SPM5 (the normalization of the EPI scans was performed using the parameters computed for the anatomical MRI of the subject), and spatial smoothing with a Gaussian kernel (FWHM = 6 mm). 
Statistical analysis
All contrasts were computed on an individual basis using a General Linear Model (GLM). For the localizer runs, the periods of stimulation were modeled using boxcar regressors. For the main experiment and the control runs, the hemodynamic response was decomposed into a transient and sustained activity over the period of appearance of each percept, respectively modeled by an event and a boxcar function, convolved with the canonical hemodynamic response function (HRF) provided with SPM5 (http://www.fil.ion.ucl.ac.uk/spm/). For the control runs, regressors were constructed based on the onset of the static and moving diamonds; in the main experiment, we used the time where the subjects reported a switch toward a bound (B) or unbound (UB) percept. To accommodate for the delay between the actual perceptual switch and the motor response, 500 ms were subtracted from the times of button presses. High-pass filtering (cutoff = 128 s) removed the low-frequency confounds. Global changes were removed by scaling with the grand mean of the volumes. 
The main analysis consisted of comparing the bound and unbound percepts, using data from the six different displays, for both the transient and sustained activities. Individual contrast images were entered in a second level random-effect analysis comparing the values at each voxel to zero with a t-test. A whole brain analysis was run with conservative statistical thresholds (voxel level, p < 0.01 uncorrected; cluster-level, p < 0.05 FWE corrected for the whole search volume, which corresponds to a cluster of 358 voxels, i.e., 2864 mm3). In order to recover activations in low-order visual areas that were expected to be too small to survive the stringent statistical thresholding used in the whole brain analysis, we ran an analysis of occipital and posterior temporal lobe activities where we allowed for a more liberal statistical threshold (voxel level, p < 0.01 uncorrected; cluster level, at least 40 voxels, i.e., 320 mm3). For this second analysis, a mask of the occipital lobe and posterior temporal regions (y < −35 mm) was created using the MARINA toolbox (Walter et al., 2003). 
ROI-based analysis
ROI definition
The MarsBar toolbox was used to construct ROIs and extract their average time course in each run. A first set of ROIs (see Table 1) was defined from the localizer and control runs. For the localizer of object-related areas, two spherical ROIs (diameter = 1 cm) were drawn around the group maximal activity in the pFs (posterior fusiform) and LO (lateral occipital) in each hemisphere. Similarly, spherical ROIs were drawn around the hMT+ maxima of the motion localizer in each hemisphere (center coordinates of the spherical ROIs are detailed in Table 1). From the control runs, we also defined a region in the calcarine sulcus corresponding to the retinotopic areas activated by the presence of the static and moving stimulus relative to the fixation baseline. The retinotopic mapping obtained for four of the participants confirmed that this activity lay in V1. 
Table 1
 
Coordinates of the voxels of maximal activation in the localizer runs. For the ventral and dorsal areas, the temporal courses of activations during the two percepts were studied in 1-cm diameter ROIs centered on these voxels (Figure 3A).
Table 1
 
Coordinates of the voxels of maximal activation in the localizer runs. For the ventral and dorsal areas, the temporal courses of activations during the two percepts were studied in 1-cm diameter ROIs centered on these voxels (Figure 3A).
Processing stream Contrast ROI label MNI coordinates: left hemisphere MNI coordinates: right hemisphere
x y z x y z
V1 Control run (Full diamond > fixation) V1 −6 −102 0 4 −96 −6
Ventral areas Shape areas localizer (Intact > Scrambled) pFs −44 −62 −18 40 −60 −18
LO −36 −82 −6 42 −76 −4
Dorsal areas Motion areas localizer (Motion > Static) hMT −42 −70 6 42 −62 6
A second set of ROIs was defined using the results of the group analysis for the main contrasts of interest. When two similar regions were highlighted from the transient and sustained components of the BOLD response, only the largest one was kept as a ROI (see Table 2, clusters in italic). This second set thus includes regions in the bilateral fusiform gyrus (Bound > Unbound), bilateral middle temporal area (Unbound > Bound), and bilateral calcarine sulcus (Bound > Unbound), as delineated by the transient-activity contrasts, and the bilateral areas in the posterior occipital lobe (Bound > Unbound), as delineated by the sustained-activity contrast. 
Table 2
 
Clusters exhibiting significant differential responses between the bound and unbound percepts. Clusters are segregated depending on the part of the response (transient or sustained) that shows a significant difference (voxel level: p < 0.01 uncorrected; cluster level: p < 0.05, FWE-corrected for the whole brain analysis, or at least 40 voxels for activities in early visual areas, see Methods). The clusters that were selected for further ROI-based analysis are highlighted in italic. For each cluster, we report the MNI coordinates and statistics of the maximally activated voxel. When a cluster encompasses several anatomical regions, additional local maxima of activity are detailed. The extent of the clusters and the associated p-value are indicated for the clusters surviving the conservative thresholding of p < 0.05, FWE-corrected (whole brain analysis). For the other clusters, corresponding to an analysis restricted to occipital and posterior temporal regions performed to uncover activation in low-order visual areas, only the extent of the clusters is indicated. L/R = Left/Right; POS = Parieto-Occipital Sulcus.
Table 2
 
Clusters exhibiting significant differential responses between the bound and unbound percepts. Clusters are segregated depending on the part of the response (transient or sustained) that shows a significant difference (voxel level: p < 0.01 uncorrected; cluster level: p < 0.05, FWE-corrected for the whole brain analysis, or at least 40 voxels for activities in early visual areas, see Methods). The clusters that were selected for further ROI-based analysis are highlighted in italic. For each cluster, we report the MNI coordinates and statistics of the maximally activated voxel. When a cluster encompasses several anatomical regions, additional local maxima of activity are detailed. The extent of the clusters and the associated p-value are indicated for the clusters surviving the conservative thresholding of p < 0.05, FWE-corrected (whole brain analysis). For the other clusters, corresponding to an analysis restricted to occipital and posterior temporal regions performed to uncover activation in low-order visual areas, only the extent of the clusters is indicated. L/R = Left/Right; POS = Parieto-Occipital Sulcus.
Cluster label Maximum voxel Cluster extent
MNI coordinates t(11) p Volume (voxels) FWE-corr p
x y z
Bound > Unbound, transient activity
L fusiform and calcarine −30 −18 −68 −90 −22 0 8.92 5.30 <0.001 <0.001 1105 <0.001
R fusiform 48 −66 −16 4.24 0.001 291
R calcarine 8 −90 −4 5.57 <0.001 250
Bound > Unbound, sustained activity
L fusiform −30 −60 −22 6.00 <0.001 412 0.029
R fusiform and posterior occipital (lower) 38 34 −72 −88 −12 −2 5.59 4.64 <0.001 <0.001 507 0.009
R posterior occipital (upper) 26 −96 20 6.10 <0.001 45
L posterior occipital (lower and upper) −26 −26 −100 −92 −2 20 7.76 5.30 <0.001 <0.001 192
Unbound > Bound, transient activity
L Middle Temporal −54 −70 12 9.64 <0.001 626 0.002
R Middle Temporal 52 −74 10 7.42 <0.001 598 0.002
R Precuneus 12 −52 8 3.87 0.001 88
R Cuneus 18 −82 40 3.86 0.001 74
R POS 18 −56 18 3.64 0.002 73
Unbound > Bound, sustained activity
R Middle Temporal 48 −68 4 10.36 <0.001 772 <0.001
L Middle Temporal −46 −68 8 4.07 0.001 136
R POS 4 −66 14 3.70 0.002 105
For both sets of ROIs, homologous regions in the left and right hemispheres were grouped together as a unique ROI. Note that for the V1 ROI derived from the control blocks, we adapted the statistical threshold in order to obtain an ROI of the same size as the calcarine ROI defined from the main contrasts (i.e., approx. 3.5 cm3, so that the final V1 ROI was obtained with a threshold at the voxel level of p < 0.03). In each ROI, we estimated the BOLD response to the perceptual transitions using the same models as those used at the voxel level. 
Perception-related effect in the localizer-defined ROIs
The contrast between the bound and unbound percepts was tested (paired t-test) in the first set of ROIs, defined from the localizer and control runs. Note that, because the second set of ROIs was defined from the result of this contrast in the whole-brain analysis, the perception effect was not tested again in these ROIs. 
Analyses of stimulus and induction mode effects
In all ROIs showing a significant effect of the perceptual state, we tested the effect of the experimental conditions on the difference of BOLD responses between the two percepts—separately for the transient and sustained components of the BOLD response—with two-way ANOVAs, with Stimulus (three levels: Contrast, Motion, and Shape) and Induction (two levels: Spontaneous and Evoked) as within-subject factors. In all ROIs, we also compared the sustained responses averaged over transitions to bound and unbound percepts for each display to assess the effect of the physical modulations of contrast, motion, and shape in the evoked conditions. 
Modeling the time courses of the BOLD responses
In a second stage, we reconstructed the hemodynamic responses of each ROI to both types of transitions (toward bound and unbound percept) using the Finite Impulse Response (FIR) model provided with the MarsBar program. This was done separately for the spontaneous and evoked transitions. For this analysis, the initial and final periods of fixation were modeled with a boxcar function convolved with the canonical HRF, and thus the implicit baseline was the activity during the presentation of the moving stimuli excluding the activity time-locked to the perceptual switches. Each time course was reconstructed in a 15 s time-window (i.e., during seven scans, with the perceptual transition occurring in the first scan). The choice of this temporal window was motivated by the fact that the average duration of a perceptual episode was about 10 s, and the delay of the hemodynamic response is around 5 s. 
For the ROIs showing a significant difference between the two percepts (which include all ROIs of the second set by definition, as well as V1, hMT, and pFs in the first set), we tested the bound–unbound contrast at each time point with Wilcoxon paired-value tests (in case the FIR data would not be normally distributed) to assess the first time when the signal difference reaches significance (see Figure 3). 
In order to better estimate the BOLD response latencies in the fusiform and hMT+ areas, we ran a supplementary analysis where we modeled the transient response only using both the HRF and its time derivative, which allows recovering small variations in latencies of the response (e.g., Henson, Price, Rugg, Turner, & Friston, 2002). Details are provided in Appendix A
Control experiment: eye movements
A separate off-scan experiment was performed in order to assess fixation accuracy and the influence of eye movements on perceptual bistability (see details in the Supplemental Material). 
Results
Behavioral data
Behavior in the scanner
From the button-press records, we extracted the transition times from bound to unbound percepts—and vice-versa—as well as the duration of each perceptual episode. For each session and each subject, we then calculated the number of perceptual alternations and the ratio of the total duration of bound percepts over the sum of the total duration of bound and unbound percepts (Figure 2a and b). These values were submitted to two-way ANOVAs with Stimulus (Contrast, Motion, Shape) and Induction mode (Spontaneous and Evoked) as within-subject factors. As one subject did not exhibit any perceptual alternations during some sessions, the analysis is restricted to the other 12 subjects. No significant effect of Stimulus type (F[2, 22] = 1.79, ε = 0.650, p = 0.20, for number of transitions; F[2, 22] = 0.446, ε = 0.809, p = 0.61, for duration ratios), Induction mode (F[1, 11] = 0.151, p = 0.70, for number of transitions; F[1, 11] = 0.653, p = 0.44, for duration ratios), or interaction between the two (F[2, 22] = 1.133, ε = 0.918, p = 0.34, for number of transitions; F[2, 22] = 1.258, ε = 0.738, p = 0.30, for duration ratios) were found. In addition, the average proportion of time spent in the bound percept (45% ± 11) was not significantly different from 50% (t(11) = 1.68, p = 0.12). Overall, these findings made us confident that the analysis of the fMRI data would not be contaminated by different signal-to-noise ratios for the six conditions and the two percepts. 
Figure 2
 
Behavioral data in the scanner for evoked (blue) and spontaneous (orange) transitions. Error bars: ± 1 standard error. (a) Number of perceptual transitions for the six motion displays. (b) Percent time spent in the bound percept for each stimulus Condition. (c) Durations of perceptual episodes for the Evoked (•) and Spontaneous (▪) transitions as a function of their rank. Evoked alternations entail percept durations commensurate with the period of the physical modulations. With spontaneous alternations, episode duration varies with its rank, with longer durations for the percept seen at stimulus onset. This first perceptual state—bound or unbound—depended on the observers. (d) Normalized distributions of episode durations for both induction modes. This distribution followed a log-normal distribution only in the spontaneous conditions.
Figure 2
 
Behavioral data in the scanner for evoked (blue) and spontaneous (orange) transitions. Error bars: ± 1 standard error. (a) Number of perceptual transitions for the six motion displays. (b) Percent time spent in the bound percept for each stimulus Condition. (c) Durations of perceptual episodes for the Evoked (•) and Spontaneous (▪) transitions as a function of their rank. Evoked alternations entail percept durations commensurate with the period of the physical modulations. With spontaneous alternations, episode duration varies with its rank, with longer durations for the percept seen at stimulus onset. This first perceptual state—bound or unbound—depended on the observers. (d) Normalized distributions of episode durations for both induction modes. This distribution followed a log-normal distribution only in the spontaneous conditions.
Figure 3
 
Regions of interest (ROI) and their patterns of response to perceptual transitions toward bound (green) and unbound (red) percepts. ROIs are represented overlaid on the average anatomy of 12 subjects. Time courses are reconstructed with a FIR model and averaged over both hemispheres (Error bars: ± 1 standard error). For the ROIs showing a significant difference between the two percepts, the gray line indicates the earliest time point at which the bound–unbound difference achieves significance (uncorrected p < 0.05; note that the same time point was obtained using either a Wilcoxon paired-value test or a paired t-test). (a) ROIs defined from localizer and control runs; hMT, human MT; pFs, posterior fusiform; LO, Lateral Occipital; V1, primary visual area. The light blue circles delineate the hMT, LO, and pFs ROIs as they are defined in the study, i.e., 1-cm diameter spheres around the maximal shape and motion-related activity in the localizer runs (coordinates in Table 1). For visualization purposes, motion- and shape-related activities are represented with the same cluster-level threshold (p < 0.05, FWE-corrected for the whole brain volume) but different voxel-level thresholds (p < 0.0001 for hMT; p < 0.01 for LO and pFs). For V1, the thresholds used for visualization are the same as those used to define the ROI from the activity related to the full diamond in the control run (p < 0.03 uncorrected at the voxel-level). (b) ROIs defined from the main contrasts. The thresholds used for visualization are the same as those used to define the ROIs (voxel-level, p < 0.01; cluster-level, p < 0.05, FWE-corrected or 40 voxels, see Methods). For comparison purpose, the “Middle temporal,” “Fusiform,” and “Calcarine” areas are represented overlaid on the same anatomical slices as “hMT,” “pFs,” and “V1,” respectively, “hMT” and “pFs” being reported with dotted circles as in a.
Figure 3
 
Regions of interest (ROI) and their patterns of response to perceptual transitions toward bound (green) and unbound (red) percepts. ROIs are represented overlaid on the average anatomy of 12 subjects. Time courses are reconstructed with a FIR model and averaged over both hemispheres (Error bars: ± 1 standard error). For the ROIs showing a significant difference between the two percepts, the gray line indicates the earliest time point at which the bound–unbound difference achieves significance (uncorrected p < 0.05; note that the same time point was obtained using either a Wilcoxon paired-value test or a paired t-test). (a) ROIs defined from localizer and control runs; hMT, human MT; pFs, posterior fusiform; LO, Lateral Occipital; V1, primary visual area. The light blue circles delineate the hMT, LO, and pFs ROIs as they are defined in the study, i.e., 1-cm diameter spheres around the maximal shape and motion-related activity in the localizer runs (coordinates in Table 1). For visualization purposes, motion- and shape-related activities are represented with the same cluster-level threshold (p < 0.05, FWE-corrected for the whole brain volume) but different voxel-level thresholds (p < 0.0001 for hMT; p < 0.01 for LO and pFs). For V1, the thresholds used for visualization are the same as those used to define the ROI from the activity related to the full diamond in the control run (p < 0.03 uncorrected at the voxel-level). (b) ROIs defined from the main contrasts. The thresholds used for visualization are the same as those used to define the ROIs (voxel-level, p < 0.01; cluster-level, p < 0.05, FWE-corrected or 40 voxels, see Methods). For comparison purpose, the “Middle temporal,” “Fusiform,” and “Calcarine” areas are represented overlaid on the same anatomical slices as “hMT,” “pFs,” and “V1,” respectively, “hMT” and “pFs” being reported with dotted circles as in a.
Beyond these global similarities between the six displays, we also considered the possibility that the fine dynamics of perceptual alternations could differ between conditions. To assess these local dynamics, we analyzed the durations of the first seven episodes (the minimal number of episodes for a single session being eight) in a three-way ANOVA with Stimulus (three levels: Contrast, Motion, Shape), Induction mode (two levels: Spontaneous and Evoked), and Episode (seven levels) as within-subject factors. These analyses (Figure 2c) revealed a significant effect of the Episode factor (F[6, 66] = 3.911, ε = 0.474, p = 0.02), as well as an interaction between Episode and Induction (F[6, 66] = 3.947, ε = 0.477, p = 0.02). None of the other main effects or interactions was significant (all p > 0.08). Post-hoc Tukey HSD tests for the spontaneous sessions showed that the first episode was significantly longer than the second, fourth, sixth, and seventh episodes (all p < 0.03), and that the fourth episode was shorter than the fifth (p = 0.03). Therefore, the first chosen percept was the longest, and subsequent episodes of the same percept tended to last longer than the others. Importantly, this effect does not result from an initial systematic preference for a particular percept: The first percept was bound in 17 cases and unbound in 19 cases. Such a pattern was not observed in the evoked sessions where periodic variations of each critical dimension reliably governed the perceptual alternations. 
Finally, since a common “signature” of bistable perception is the observation of a log-normal (or gamma) distribution of the durations of perceptual episodes, we compared the duration distribution of all perceptual episodes for spontaneous and evoked transitions, after removal of the first and last episodes and normalization by the average episode duration for each session (Figure 2d). Although perceptual transitions did follow a log-normal distribution in the spontaneous sessions (p = 0.15, Kolmogorov-Smirnov test), this was not true for the evoked sessions (p < 0.01). A direct comparison between spontaneous and evoked sessions further indicated that the shapes of the two distributions were significantly different from each other (p < 0.001, Kolmogorov-Smirnov test). 
Thus, the fine-grained dynamics of alternations observed in evoked sessions tightly followed the display variations and slightly differed from the alternations observed in spontaneous sessions. These findings confirm that the perceptual switches in the two types of sessions arise from different mechanisms, either endogenously triggered or externally induced. 
Eye movements
As involuntary eye movements under fixation requirements could modulate neural activity and/or perceptual dynamics, we analyzed the eye traces recorded during an off-scan experiment replicating the experiment performed in the scanner (see Supplemental data). Overall, these analyses could not correlate perceptual alternation dynamics with any oculomotor parameter (saccades, blinks, or fixation accuracy), did not reveal any differences of oculomotor behavior between the bound and unbound states, and rather indicate that observers accurately maintained fixation as required. It thus seems unlikely that eye movements can account for the perceptual effects or the observed pattern of fMRI results, described hereafter. 
fMRI results
Landmark visual areas
Form areas were identified using scrambled and unscrambled versions of static objects (Kourtzi & Kanwisher, 2000a). As described by Grill-Spector, Kourtzi, and Kanwisher (2001), these areas included both lateral occipital (LO) and temporal foci along the posterior fusiform gyrus (pFs) (see Figure 3a and Table 1). Motion areas (including hMT+) were identified by contrasting static and coherently moving dots (Tootell et al., 1995; also see Figure 3a). 
We then compared the form and motion areas with those activated by both a static and moving version of a fully visible diamond stimulus that did not entail bistable perception (control runs). The regions activated by these unambiguous diamonds were included in or overlapped by those identified with the form and motion localizers in the ventral and dorsal processing streams, along with early visual areas (Figures B.1 and B.2 in Appendix B), suggesting that our stimuli will indeed engage both dorsal and ventral visual areas, at least during the bound phases of bistable perception. For the four subjects who underwent retinotopic mapping, we could verify that the group activity elicited by the fully visible diamond in the calcarine sulcus was included in V1. 
Perceptual transitions
As expected from previous studies (Lumer, Friston, & Rees, 1998; Sterzer & Kleinschmidt, 2007), the analysis of the event-related response to all perceptual transitions irrespective of their direction (toward a bound or unbound percept) revealed the involvement of a distributed network of frontal, motor, visual, and subcortical regions during perceptual switches. In the following, however, we focus our analyses on the differential activity related to the bound and unbound percepts (Table 2 and Figure 3b). 
BOLD activity during the bound percept
Visual regions exhibiting higher (transient) activity at the onset of a bound percept than at the onset of an unbound percept (Table 2) were mainly found in the posterior fusiform gyri, extending laterally to the posterior part of the inferotemporal gyrus (ITG) bilaterally and in the lingual gyrus in the left hemisphere. The activity in the posterior fusiform gyrus largely overlapped the pFs activity found in the landmark visual areas (Figure 3, slices z = −18 mm). Enhanced activity for the bound percept in bilateral fusiform gyrus persisted during the sustained part of the response (Table 2 and Appendix B). 
In the calcarine sulcus, a region overlapping the representation of the fully visible diamonds (control runs) in V1 and extending in the medial part of the lingual gyrus in the right hemisphere showed significant differential activity between transitions toward bound and unbound percepts for the early part of the response only. Late differential activity (i.e., significant only during the sustained part of the response) was observed bilaterally in the posterior occipital lobe (Figure 3b, slice x = −26 mm) around two maxima, a lower one at z = −2 mm, and an upper one at z = 20 mm (Table 2). In the four subjects who underwent retinotopic mapping, we could verify that these foci were located beyond V3. Note that this area is also sensitive to motion, as it was more active during moving RDPs than during static ones in the motion area localizer run and likely corresponds to V3a (Tootell et al., 1997). 
BOLD activity during the unbound percept
Visual areas exhibiting higher (transient and sustained) activity following transitions toward an unbound percept were found in both hemispheres at the temporo-occipital junction in the posterior portion of the middle temporal gyrus, initially extending toward the middle occipital gyrus (see Table 2 and Figure 3b, slice y = −70 mm). The activity at the temporo-occipital junction corresponds to the hMT+ region, but the foci of maximal activity are slightly above the motion areas previously described in the “landmark” section (compare slices y = −70 mm in Figure 3a and 3b), so they probably do not correspond to the core MT region (see supplementary material for a comparison of individual peak coordinates in the localizer and main contrasts). Small clusters showing differential activity between the two percepts were also found in midline structures at the parieto-occipital junction in the right hemisphere (see Table 2). 
ROI-based analysis
Perception-related activity
By definition, all the ROIs defined from the main contrast show a significant effect of the percept. For the ROIs that were defined from the localizer runs, we tested the contrast between the Bound and Unbound percepts. The p-values reported here should be compared to 0.0125 to correct the significance for the number of tested ROIs. 
Surprisingly, the activity of area LO did not show any significant effect of the percept (p = 0.06 and p = 0.25 for the transient and sustained component of the response, respectively). In counterpart, and as could be expected after the fusiform activity previously described, area pFs showed a strong perception-related activity (p = 0.004 and p = 0.002 for the transient and sustained activity, respectively). 
In hMT+, the differential activity between transitions toward bound and unbound percepts was significant (p = 0.02 and p = 0.004 for transient and sustained activity). 
In V1, finally, we found a significant transient differential activity (p = 0.016), although the perception-related difference was not significant for the sustained part of the response (p = 0.95), in line with the findings of the main analyses (see the “calcarine” clusters, Table 2). 
Effects of stimulus and induction mode on perception-related activity
To test whether the activity of the regions disclosing a significant effect of the percept support generic mechanisms of visual binding or depend on the way the percept is induced, we analyzed the effect of the six presentation conditions on their differential BOLD activity between the two percepts, using a 3 (Stimulus type: Contrast/Motion/Shape) × 2 (Induction mode: Spontaneous/Evoked) repeated-measure ANOVA (see Methods). This was done both for the ROIs defined from the main contrasts (Figure 3b and Table 2) and for V1, hMT, and pFs as defined from the localizers and control blocks (Figure 3a and Table 1). 
In none of these ROIs did we find any significant influence of the induction mode or the stimulus type on the differential activity between the bound and unbound percepts (Induction mode: p = 0.16 for transient activities in the “Middle temporal” cluster, and p > 0.18 elsewhere; Stimulus type: all p > 0.16). The only (marginally) significant effect was an interaction Stimulus × Induction in the posterior occipital ROI for sustained activities (F[2, 22] = 3.50; ε = 0.955; p = 0.051), corresponding to a greater difference of activity between percepts when transitions were evoked by Shape variations or spontaneously arising in Contrast and Motion displays compared to the other three displays. 
Feature-dependent activity
Although the six experimental conditions induce identical bound and unbound percepts and do not modulate differently the perception-related activity in the ROIs previously tested, the variations of contrast, motion, or shape occurring in the physically changing displays were expected to differentially activate regions preferentially sensitive to these features. For instance, our Motion display was likely to modulate activity in hMT+ (Qian & Andersen, 1994); periodic changes from a diamond-shape to a chevron-shape were likely to modulate activity in the LOC region; and changes in contrast were more likely to modulate activity in contrast-sensitive regions such as V1 and V2. 
We tested this assumption by averaging the sustained activity during the bound and unbound percepts for each feature. As expected, comparing these averaged values with a Friedman test (three levels: Contrast, Motion, and Shape) revealed a small but significant advantage for the Motion display as compared to the Contrast and Shape displays in hMT+ and middle temporal, and of the Motion and Shape displays relative to the Contrast display in pFs and Fusiform (see Appendix C). 
Time courses
Figure 3 shows the time course of the BOLD response to both types of perceptual transitions reconstructed in all ROIs, separately for the evoked and spontaneous transitions. We can notice the great similarity of the time courses across induction modes. 
As described previously, the fusiform ROI defined from main contrast includes the pFs ROI defined from the landmark-localizer runs. We can see here that the time courses over the smaller ROI (“pFs”) are almost identical to that of the larger one (“Fusiform”). By contrast, the time course of area hMT derived from the localizer differs from that of the “Middle temporal” area defined from the main contrast, with a small but marked enhancement of activity early after each type of perceptual transition. Although the precise localization of the occipital ROIs slightly differs depending on whether they were defined as the retinotopic representation of the fully visible moving diamond in the control run (“V1”) or from the contrast between the bound and unbound percept in the main contrast (“Calcarine”), one can notice that the time courses are quite similar with a nonsignificant tendency to show greater differences between percepts for the spontaneous transitions only. This, however, does not show up in the results of the ANOVA described in the previous section. 
Note that the LO region defined from the localizers does not have any homologous defined from the main contrasts, nor does the Posterior occipital region defined from the main contrast have an easily defined homologous in the localizer runs. 
Overall, these time courses indicate that a transition toward the bound percept is coupled to a rapid BOLD increase in V1 and the fusiform gyri (pFs) peaking 5 s after the onset of the bound percept, a later BOLD increase in the posterior occipital lobe (putative V3a), and to a concomitant BOLD decrease in the upper part of hMT+. 
In contrast, a transition toward an unbound percept is coupled to an early and small BOLD increase in the middle temporal ROI and only modest BOLD changes in V1, V3a, and fusiform gyri. Note that, because of the experimental design, the baseline for the event-related responses was the activity during the presentation of the moving stimuli, excluding the activity time-locked to the perceptual switches. Therefore, the pattern of activity in the middle temporal ROI during the bound percept is not a deactivation relative to a low-level baseline but rather a transient signal decrease relative to a steady-state activity during motion perception. As shown in Appendix A, the peak of this decreased activity occurs slightly after the peak of activity in the fusiform area. 
Discussion
This study identified a network of occipital, ventral, and dorsal visual regions showing differential BOLD activity during the perception of a rigidly rotating bound shape or the perception of four unbound contours translating up and down independently. The middle temporal and fusiform regions showing percept-related modulation were embedded within or adjacent to those identified using functional localizers of hMT+ and LOC (notably the pFs). The activity found in the calcarine sulcus lay in V1, as identified by retinotopic mapping in four observers, and overlaid the retinotopic representation of the visual stimulation, as confirmed by the analyses of control conditions of the whole group. Another occipital activity lay more laterally, possibly in V3a. This perception-dependent network was involved whether the transitions were spontaneous or evoked. 
Of particular interest is the balance of BOLD activity between the middle temporal and fusiform region coupled with observers' perceptual alternations: Bound states were correlated to higher activity in the ventral region and lower activity in the dorsal region, and the reverse during unbound states. This balance was, however, asymmetrical as the activity evoked by perceptual transitions with respect to the ongoing perceptual state was always positive in the fusiform region, while it was either positive or negative in the hMT+ region. 
As the present displays all modulated the perception of form and motion, this cortical network is in keeping with previous findings that ventral regions are selective for shape, whereas dorsal regions are more selective for motion. However, the present results bring new insights into the coactivation of these regions during perceptual form-motion binding. In the following, we first discuss the genericity of our results and try to offer a functional interpretation of our data. 
Genericity of the perception-related network
Middle temporal and fusiform areas
In the present study, physically modulating contrast, motion, or shape in the evoked conditions was expected to differentially activate regions sensitive to these features. Indeed, the three types of displays differently modulated the global BOLD activity of the ventral and dorsal ROIs (see Appendix C). In particular, the activity of the middle temporal area was invariant to contrast and shape variations, but sensitive to the variations of the motion display. On the other hand, the fusiform area was more strongly activated by the shape and motion variations than by the contrast variations. In these areas, however, percept-related activity (the difference of BOLD response between bound and unbound percepts) did not depend upon the features used to evoke the perceptual transitions. It is not impossible that the small number of perceptual alternations in each condition has limited the statistical power of the analyses. Nevertheless, the observation that the joint analysis of the experimental conditions yielded significant percept-related BOLD responses suggests these BOLD modulations were consistent across conditions. This validates the use of different displays to identify a generic network underlying the form/motion binding process and further suggests that feature-related activity can be dissociated from percept-related activity (Paradis, Droulez, Cornilleau-Peres, & Poline, 2008). 
The genericity of the ventrodorsal network is also supported by the fact that its activity does not depend on the type of transitions (see the results of the ANOVA on the ROI data), although evoked and spontaneous transitions likely involve different distribution of attention over time. Indeed, the finding that pupil dilation, a marker of attentional load (Beatty & Wagoner, 1978), tends to follow different time courses with these two induction modes (see Hupé, Lamirel, & Lorenceau, 2009, whose stimuli were similar to those of the present study) suggests that, for the periodic and predictable evoked transitions, observers only need to monitor the time of switch while they must continuously scrutinize internal chaotic fluctuations in the spontaneous condition. The finding that the time course of activity in the middle temporal and fusiform areas is very reproducible across the two modes of induction further supports this genericity of the observed activity. 
The genericity of the percept-related network outlined here is finally supported by previous fMRI studies of bistable binding with different stimuli. Regarding the LOC, previous studies reported enhanced activity in this area during bound states (Murray et al., 2002; Fang et al., 2008), which is in agreement with the present results, although the features delineating the shape were different. Regarding the dorsal stream, our results are similar to previous studies using bistable moving plaids made of superimposed drifting gratings (Castelo-Branco et al., 2002) and 3D structure-from-motion stimuli (Murray et al., 2003, 2002, experiment 2) reporting decreased activity in hMT+ during coherent motion as compared to incoherent motion. In these later studies, the diminished activity in hMT+ during coherent plaid motion—or coherent 3D motion—was accounted for by the different number of perceived motion directions during coherent and incoherent states. Whether the same logic applies in the present study is unclear, because the rotation seen during bound states covers all possible motion directions in succession during a full 1 s cycle. An alternative possibility is that the hMT+ BOLD activity depends on the number of moving objects rather than the number of motion directions, but we will discuss other possible explanations later. 
The ventral and dorsal activities found herein are therefore in keeping with the studies of Murray et al. (2002) and Castelo-Branco et al. (2002). With the present study reporting for the first time their simultaneous involvement during perceptual binding, we now dispose of converging evidence suggesting that these areas are involved in perceptual binding independently of the visual features used in the bistable displays. 
Low-level visual areas
The conclusion may be rather different for the low-level visual areas highlighted in this study, notably V1. Indeed, while this area shows enhanced activity for bound as compared to unbound percept in the present study, its activity was decreased for the same contrast in the studies by Murray et al. (2002) and Fang et al. (2008). A first possibility is that one study highlighted a subregion with increased activity while the other highlighted another subregion with decreased activity only. This, however, seems very unlikely as we used two different ways to identify activity in the calcarine sulcus (global testing in the occipital pole and specific testing in the voxels in retinotopic register with the stimulus, this latter method being very similar to that employed by Fang et al., 2008). These discrepant findings rather suggest that, in low-level areas, the percept-related activity uncovered with bistable stimuli may also be influenced by the physical properties of the stimuli. Indeed, Fang et al. (2008) used a (black) diamond with vertices occluded by invisible static masks and undergoing a back-and-forth horizontal translation with abrupt motion reversals. In our (white) displays, the occluders were either invisible (Motion and Contrast displays) or visible (Shape display), and the diamond underwent a circular translation. In addition to the above-mentioned physical differences, the stimulus used by Murray et al. (2002) and Fang et al. (2008) entails a perceptual phenomenon that may have influenced the BOLD activity: Kinetic illusory contours appear along the line-end paths during bound states and disappear during unbound states (Anderson & Sinha, 1997). This feature, lacking in our displays, could have modulated the BOLD response in occipital regions (Van Oostende, Sunaert, Van Hecke, Marchal, & Orban, 1997; Larsson, Heeger, & Landy, 2010; Clifford, Mannion, & McDonald, 2009). Consistently, optical imaging studies in nonhuman primates (Ramsden, Hung, & Roe, 2001) revealed a de-emphasis of the response to illusory contour in V1 concomitant with a V2 activation, which brings support to this possibility. However, whether motion-induced illusory contours account for some of the decreased V1 activity found by Fang et al. (2008) remains to be tested. 
Form/motion binding and the predictive coding model
In their studies, Murray et al. (2002) and Fang et al. (2008) relied on the predictive coding theory (see Introduction) to account for their results: A strong prediction from the shape-selective area LOC, during bound states, would reduce the error signal in the lower area V1, while a weak predictive signal from the LOC during unbound states would result in enhanced activity in V1. With the different pattern of occipital ventral and dorsal activity reported here, we wondered whether the predictive coding framework could still apply or whether alternative interpretations should be considered. 
Our observation that the calcarine and fusiform areas are modulated in phase (both showing enhanced activity for the bound percept) does not fit with a balance of activity between a low-level area and a higher-level area of the ventral pathway. Also, the present finding that perceptual form/motion binding involves a dorsal area, which was not reported by Fang et al. (2008), introduces a new parameter that should be taken into account in the model. As the interplay between this middle temporal area and V1 bears similarity with that previously reported between LOC and V1, the present results could be interpreted as predictive signals developing in the dorsal pathway between hMT+ and V1. However, it remains unclear how predictive coding alone can account for both results. More generally, those results question how predictive coding deals with multiple feedback connections from dorsal and ventral areas onto V1 when both motion and form are involved. 
In addition, within a predictive coding approach, top-down inferences should depend on stimulus ambiguity and perceptual uncertainty. If we assume that spontaneously bistable stimuli maximize visual ambiguity, they should augment the need for perceptual inferences, while stimuli in which sensory evidence strongly favors a particular interpretation (as in our evoked conditions) may involve predictive coding to a lesser degree, inducing stronger activity in the lower-level areas selective for the concerned dimensions. However, we did not observe any significant difference of activity between the evoked and spontaneous transitions or between displays in the occipital, ventral and dorsal areas of the perception-related network. 
Altogether, it appears difficult to explain the present results within a predictive coding framework. We therefore consider alternative accounts of the present findings in the following discussion. 
Motion and form-related activity
The interesting characteristic of the displays used herein is not only that they carry shape and motion information but that both types of information are tightly coupled: Seeing a closed rigid shape is inevitably correlated to perceiving a global rotation and, reciprocally, seeing disorganized motions inevitably entails the perception of independent contours. 
Attentional competition
In these conditions, to track and report their internal states, observers may have attended to shape rigidity during bound states and to the multiple motion organization during unbound states. Attending to shape rigidity could account for the enhanced activity in the fusiform region and is consistent with a decreased activity in the middle temporal region, as it was already shown that attended shape transitions could induce decreased activity in the dorsal pathway (Paradis et al., 2008). Similarly, attending to multiple moving objects could account for the enhanced activity in the middle temporal region during unbound states. However, it is not clear why attending to multiple moving objects would still result in a (small) increase of activity in pFs, whereas one could expect an activity decrease mirroring what occurs in the middle temporal region. Furthermore, a reverse strategy—enhanced attention to the rotating motion during bound states and to a disorganized shape during unbound states—is plausible as well. As each observer may have picked up or switched between one and the other strategy, it is difficult to assert that attentional effects fully account for the observed BOLD pattern at the group level. Note also that we have no evidence to decide whether such attentional effects would be the cause of the perceptual switches (as discussed by Lee, Blake, & Heeger, 2007; Pastukhov & Braun, 2007; Zhang, Jamison, Engel, He, & He, 2011) or a consequence of those. 
Weighting of perceptual evidence
A plausible explanation of the present results is that the pattern of dorsal and ventral activities reflects the relative weights, or the strength, of the actual evidence favoring one of several perceptual interpretations. Visual ambiguities would decrease the weight of the corresponding evidence and entail lower activity, while less ambiguous stimuli—possibly eliciting more reliable neural responses—would entail an increased weight of the corresponding perceptual evidence and thus higher BOLD activity. That unreliable motion information results in decreased activity is supported by electrophysiological recordings in monkeys showing stronger responses in MT to unambiguous as compared to ambiguous features (Huang, Albright, & Stoner, 2007, 2008). 
In the stimulus, the perceptual evidence related to the presence of a closed shape is overall important, because the spatial configuration of contours—or of dots in the Motion display—is sufficient to elicit the sense of a diamond figure. Accordingly, activity in the fusiform region is high at each transition and even higher for the bound percept when the interpretation of a closed shape is favored. It is then interesting to note that activity is modulated in pFs rather than LO. Previous studies using static stimuli revealed that pFs responded preferentially to complete objects, whereas LO was also sensitive to object parts (Lerner et al., 2001, 2002; Kourtzi et al., 2003). Similarly, Haushofer, Livingstone, and Kanwisher (2008) showed that pFs' responses closely correlated with human perceptual states, while LO activity was more related to physical properties of the stimuli. Altogether, these studies and the present result further suggest that pFs could provide evidence that a display contains a complete recognizable shape. 
Bound states, however, should not only correspond to high weight of evidence in favor of a closed shape, but also to a lower weight of evidence in favor of the up-and-down component directions. Indeed, the physical up-and-down contour motions are at odds with a rigid rotation and should favor an unbound interpretation whenever motion information is unambiguous and reliable. Accordingly, we observed high activity in the motion-sensitive (middle temporal) area during unbound states and low activity during a bound state, which is consistent with the idea that increased weight might be given to motion information during this bound percept, but local motion information is discarded or altered during bound states. Consistently, for the evoked transitions, the unbound percept corresponds to conditions where high-contrast line ends provide salient 2D motion cues supporting contour and motion segmentation, or when a lack of closure (Kourtzi et al., 2003; Altmann et al., 2003) in the Shape display fails to provide reliable shape information. Besides, the bound percept corresponds to conditions where high motion noise, or occluded or low-contrast line ends, results in more ambiguous up-and-down contour motions. 
For the spontaneous transitions, it is clear that the weighting of evidence will not be induced by physical changes in the stimulus. However, most models of bistable perception (e.g., Noest, van Ee, Nijs, & van Wezel, 2007) include neural adaptation (and noise) as an important ingredient needed to account for bistable perception. So, one possibility is that the weights given to critical features are modulated by successive neural adaptation of the middle temporal and fusiform regions, resulting in the spontaneous alternation of percepts. The finding that the same dorsal/ventral network is recruited in the evoked and spontaneous condition supports the view that the observed pattern of activity depends on a balance between motion and shape information, independently of the characteristics of the displays (motion, shape, or contrast) used to define them. 
In support of this weighting interpretation, a number of studies (Pack, Hunter, & Born, 2005; Hunter & Born, 2011) revealed adaptive center surround modulation of the receptive field of MT neurons as a function of the ambiguity and reliability of the input signals. This reliability could depend on intrinsic motion ambiguities (Huang et al., 2008), on contrast (Pack et al., 2005), on motion coherence (Hunter & Born, 2011) or, of relevance for the present study, on the presence of a contextual shape (Huang et al., 2007, 2008). Although the authors do not consider the possibility that the effect of a contextual shape on the adaptive modulation of MT neurons could originate from ventral/dorsal interactions, the present results suggest this hypothesis is worth considering. 
Ventral/dorsal interplay
An open issue is whether the weights are computed in isolation in the motion and shape sensitive regions or whether they are set through reciprocal interactions between these regions. The strong perceptual coupling between perceived shape and perceived motion suggests shape and motion are not processed independently. Using the perceptual evidence from both dimensions jointly may significantly increase the probability of a relevant interpretation of a visual scene (e.g., Weiss & Adelson, 1996; Bayerl & Neumann, 2007; Deneve, Latham, & Pouget, 2001), although perceptual evidence from one or the other dimension might be noisy and unreliable, and should be weighted accordingly. Several reasons plead in favor of interactions between both processing streams. Anatomical studies found dense reciprocal connections between dorsal and ventral regions in monkeys (Maunsell & Van Essen, 1983; DeYoe & Van Essen, 1988), and electrophysiology further provides evidence for interactions between the ventral and dorsal pathways (Verhoef, Vogels, & Janssen, 2010, 2011). In humans, connections between the ventral and dorsal stream have been described at various stages of the visual hierarchy (Sporns, Tononi, & Kotter, 2005; Ungerleider, Galkin, Desimone, & Gattass, 2008; Borra et al., 2008; Borra, Ichinohe, Sato, Tanifuji, & Rockland, 2010), and psychophysical studies of structure-from-motion perception are also suggestive of interactions between the shape and motion pathways (e.g., Miskiewicz, Buffat, Paradis, & Lorenceau, 2008). In addition, fMRI studies found that 3D speed gradients elicited activity in ventral regions (Murray et al., 2003; Paradis et al., 2008), while static photographs with implied motion could elicit activity in dorsal regions (Kourtzi & Kanwisher 2000b; Krekelberg et al., 2005), supporting the view that ventral and dorsal regions interact. In the present study, the opposite pattern of activity observed between the middle temporal area and the fusiform area is suggestive of a direct interaction between these two. However, the existence of causal influences between both areas remains unknown. 
One way of addressing this issue is to assume that the temporal precedence of activation in a region provides insights into causal relationships. In a recent study using 3D shapes in monkeys, Verhoef et al. (2011) observed a phase advance of IT relative to AIP, suggesting this might be evidence for a directional flow from IT to AIP. With the present data, analyzing the fine grained dynamics of neural activity is limited by the sluggish temporal envelop of the BOLD response. We nevertheless attempted to look for a temporal precedence of BOLD activation in the different ROIs. This analysis (see Appendix A) reveals that activity first increases in pFs after a transition toward a bound state, followed by a decrease of activity in the middle temporal region. As a number of factors related to vessel architecture or region-specific BOLD dynamics may account for the delay between both ROIs, we conducted the same analysis on the signal elicited by the rotating full diamond (control condition, Appendix A). This analysis did not reveal any time lag between both ROIs, suggesting that a temporal flow of activity may arise from ventral to dorsal regions in binding conditions only. In this view, the observed pattern of fMRI activity suggests that, whenever evidence favoring the existence of a closed rigid shape is strong, activity in the ventral stream influences dorsal areas so as to counteract the motion evidence for independent motions. Although speculative, this view is supported by the electrophysiological data of Verhoef et al. (2011) as well as by psychophysical evidence showing a strong influence of shape on motion integration (Lorenceau & Alais, 2001; Miskiewicz et al., 2008). Additional studies of task and stimulus-dependent directional interactions between dorsal and ventral areas in primates with high temporal resolution are, however, needed to further test the validity of the interpretation proposed here. 
Conclusion
Studying the neural correlates of bistable form/motion binding with fMRI, we identified a generic cue-invariant network of cortical areas showing differential activity during bound and unbound perceptual states by jointly analyzing the BOLD response to a variety of displays. This network was similarly recruited by unchanging displays eliciting spontaneous alternations between bound and unbound perceptual states, and by physically changing displays periodically driving perceptual alternations. Overall, the pattern of activity uncovered here may reflect the dynamic weighting of neural evidence supporting the most consistent interpretation of motion and shape information. 
Supplementary Materials
Acknowledgments
This research was supported by a grant from the Ministère de la Recherche, ACI Neurosciences Computationnelles to Jean Lorenceau. Anne Caclin was funded by a post-doctoral fellowship from Centre National de la Recherche Scientifique. Cédric Lamirel was supported by Fondation pour la Recherche Médicale, Fondation Berthe Fouassier, and Fondation de France. Thanks to Lionel Allirol for his help in data acquisition and to Anthony Norcia for helpful comments on a previous version of the manuscript. 
Commercial relationships: none. 
Corresponding author: Jean Lorenceau. 
Address: Centre de Recherche de l'Institut du Cerveau et de la Moelle épinière, Equipe Cogimage, Paris, France. 
References
Altmann C. F. Bülthoff H. H. Kourtzi Z. (2003). Perceptual organization of local elements into global shapes in the human visual cortex. Current Biology 13, 342–349. [CrossRef] [PubMed]
Anderson B. L. Sinha P. (1997). Reciprocal interactions between occlusion and motion computations. Proceedings of the National Academy of Sciences 1, 94(7), 3477–3480. [CrossRef]
Bayerl P. Neumann H. (2007). Disambiguating visual motion by form-motion interaction—a computational model. International Journal of Computer Vision, 72(1), 27–45. [CrossRef]
Beatty J. Wagoner B. L. (1978). Pupillometric signs of brain activation vary with level of cognitive processing. Science, 199, 1216–1218. [CrossRef] [PubMed]
Born R. T. Bradley D. C. (2005). Structure and function of visual area MT. Annual Review of Neuroscience, 28, 157–189. [CrossRef]
Borra E. Belmalih A. Calzavara R. Gerbella M. Murata A. Rozzi S. (2008). Cortical connections of the macaque anterior intraparietal (AIP) area. Cerebral Cortex, 18(5), 1094–1111. [CrossRef] [PubMed]
Borra E. Ichinohe N. Sato T. Tanifuji M. Rockland K. S. (2010). Cortical connections to area TE in monkey: Hybrid modular and distributed organization. Cerebral Cortex, 20(2), 257–270. [CrossRef] [PubMed]
Braddick O. J. O'Brien J. M. Wattam-Bell J. Atkinson J. Turner R. (2000). Form and motion coherence activate independent, but not dorsal/ventral segregated, networks in the human brain. Current Biology, 10, 731–734. [CrossRef] [PubMed]
Castelo-Branco M. Formisano E. Backes W. Zanella F. Neuenschwander S. Singer W. (2002). Activity patterns in human motion-sensitive areas depend on the interpretation of global motion. Proceedings of the National Academy of Sciences USA, 99, 13914–13919. [CrossRef]
Clifford C. W. G. Mannion D. J. McDonald J. S. (2009). Radial biases in the processing of motion and motion-defined contours by human visual cortex. Journal of Neurophysiology, 102, 2974–2981. [CrossRef] [PubMed]
Culham J. He S. Dukelow S. Verstraten F. A. (2001). Visual motion and the human brain: What has neuroimaging told us? Acta Psychologica (Amsterdam), 107(1-3), 69–94. [CrossRef]
Cutting J. E. (2002). Representing motion in a static image: constraints and parallels in art, science, and popular culture. Perception, 31(10), 1165–1193. [CrossRef]
Deneve S. Latham P. E. Pouget A. (2001). Efficient computation and cue integration with noisy population codes. Nature Neuroscience, 4, 8, 826–831. [CrossRef]
Denys K. Vanduffel W. Fize D. Nelissen K. Peuskens H. Van Essen D. (2004). The processing of visual shape in the cerebral cortex of human and nonhuman primates: A functional magnetic resonance imaging study. Journal of Neuroscience, 24(10), 2551–2565. [CrossRef] [PubMed]
DeYoe E. A. Van Essen D. C. (1988). Concurrent processing streams in monkey visual cortex. Trends in Neuroscience, 11, 219–226. [CrossRef]
Fang F. Kersten D. Murray S. O. (2008). Perceptual grouping and inverse fMRI activity patterns in human visual cortex. Journal of Vision, 8(7):2, 1–9, http://www.journalofvision.org/content/8/7/2, doi:10.1167/8.7.2. [PubMed] [Article] [CrossRef] [PubMed]
Friston K. J. (2005). Models of brain function in neuroimaging. Annual Review of Psychology, 56, 57–87. [CrossRef] [PubMed]
Friston K. Kiebel S. (2009). Predictive coding under the free-energy principle. Philosophical Transactions of the Royal Society B, 364, 1211–1221. [CrossRef]
Gardner J. L. Sun P. Waggoner R. A. Ueno K. Tanaka K. Cheng K. (2005). Contrast adaptation and representation in human early visual cortex. Neuron, 47, 607–620. [CrossRef] [PubMed]
Goodale M. A. Milner D. A. (1992). Separate visual pathways for perception and action. Trends in Neuroscience, 15(1), 20–25. [CrossRef]
Grill-Spector K. Kourtzi Z. Kanwisher N. (2001). The lateral occipital complex and its role in object recognition. Vision Research, 41, 1409–1422. [CrossRef] [PubMed]
Grill-Spector K. Kushnir T. Edelman S. Avidan G. Itzchak Y. Malach R. (1999). Differential processing of objects under various viewing conditions in the human lateral occipital complex. Neuron, 24(1), 187–203. [CrossRef] [PubMed]
Haushofer J. Livingstone M. S. Kanwisher N. (2008). Multivariate patterns in object-selective cortex dissociate perceptual and physical shape similarity. PLoS Biology, 6(7), e187. [CrossRef] [PubMed]
Hayworth K. J. Biederman I. (2006). Neural evidence for intermediate representations in object recognition. Vision Research, 46(23), 4024–4031. [CrossRef] [PubMed]
Henson R. N. Price C. J. Rugg M. D. Turner R. Friston K. J. (2002). Detecting latency differences in event-related BOLD responses: application to words versus nonwords and initial versus repeated face presentations. Neuroimage, 15, 83–97. [CrossRef] [PubMed]
Hochstein S. Ahissar M. (2002). View from the top: Hierarchies and reverse hierarchies in the visual system. Neuron, 36, 791–804. [CrossRef] [PubMed]
Huang X. Albright T. D. Stoner G. R. (2007). Adaptive surround modulation in cortical area MT. Neuron, 53(5), 761–770. [CrossRef] [PubMed]
Huang X. Albright T. D. Stoner G. (2008). Stimulus dependency and mechanisms of surround modulation in cortical area MT. Journal of Neuroscience, 28(51), 13889–13906. [CrossRef] [PubMed]
Huk A. C. Dougherty R. F. Heeger D. J. (2002). Retinotopy and functional subdivision of human areas MT and MST. Journal of Neuroscience, 22, 7195–7205. [PubMed]
Hunter J. N. Born R. T. (2011). Stimulus-dependent modulation of suppressive influences in MT. Journal of Neuroscience, 31(2), 678–686. [CrossRef] [PubMed]
Hupé J. M. Lamirel C. Lorenceau J. (2009). Pupil dynamics during bistable motion perception. Journal of Vision, 9(7):10, 1–19, http://www.journalofvision.org/content/9/7/10, doi:10.1167/9.7.10. [PubMed] [Article] [CrossRef] [PubMed]
Jehee J. F. Ballard D. H. (2009). Predictive feedback can account for biphasic responses in the lateral geniculate nucleus. PLoS Computational Biology, 5(5), e1000373. [CrossRef] [PubMed]
Johansson G. (1973). Visual perception of biological motion and a model for its analysis. Perception & Psychophysics, 14(2), 201–211. [CrossRef]
Kersten D. Yuille A. (2003). Bayesian models of object perception. Current Opinions in Neurobiology, 13(2), 150–158. [CrossRef]
Könen C. S. Kastner S. (2008). Two hierarchically organized neural systems for object information in human visual cortex. Nature Neuroscience, 11, 224–231. [CrossRef] [PubMed]
Kourtzi Z. Kanwisher N. (2000a). Cortical regions involved in perceiving object shape. Journal of Neuroscience, 20, 3310–3318.
Kourtzi Z. Kanwisher N. (2000b). Activation in human MT/MST by static images with implied motion. Journal of Cognitive Neuroscience, 12, 48–55. [CrossRef]
Kourtzi Z. Tolias A. S. Altmann C. F. Augath M. Logothetis N. K. (2003). Integration of local features into global shapes: Monkey and human FMRI studies. Neuron, 37, 333–346. [CrossRef] [PubMed]
Krekelberg B. Vatakis A. Kourtzi Z. (2005). Implied motion from form in the human visual cortex. Journal of Neurophysiology, 94, 4373–4386. [CrossRef] [PubMed]
Larsson J. Heeger D. J. Landy M. S. (2010). Orientation selectivity of motion-boundary responses in human visual cortex. Journal of Neurophysiology, 104, 2940–2950. [CrossRef] [PubMed]
Lee S. H. Blake R. Heeger D. J. (2007). Hierarchy of cortical responses underlying binocular rivalry. Nature Neuroscience, 10(8), 1048–1054. [CrossRef] [PubMed]
Lerner Y. Hendler T. Ben-Bashat D. Harel M. Malach R. (2001). A hierarchical axis of object processing stages in the human visual cortex. Cerebral Cortex, 11, 287–297. [CrossRef] [PubMed]
Lerner Y. Hendler T. Malach R. (2002). Object-completion effects in the human lateral occipital complex. Cerebral Cortex, 12, 163–177. [CrossRef] [PubMed]
Lorenceau J. (1996). Motion integration with dot patterns: effects of motion noise and structural information. Vision Research, 36, 3415–3427. [CrossRef] [PubMed]
Lorenceau J. Alais D. (2001). Form constraints in motion binding. Nature Neuroscience, 4, 745–751. [CrossRef] [PubMed]
Lorenceau J. Shiffrar M. (1992). The influence of terminators on motion integration across space. Vision Research, 32, 263–273. [CrossRef] [PubMed]
Lumer E. D. Friston K. J. Rees G. (1998). Neural correlates of perceptual rivalry in the human brain. Science, 280, 1930–1934. [CrossRef] [PubMed]
Majaj N. J. Carandini M. Movshon J. A. (2007). Motion integration by neurons in macaque MT is local, not global. Journal of Neuroscience, 27(2), 366–370. [CrossRef] [PubMed]
Malach R. Reppas J. B. Benson R. R. Kwong K. K. Jiang H. Kennedy W. A. (1995). Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proceedings of the National Academy of Sciences USA, 92, 8135–8139. [CrossRef]
Marcar V. L. Xiao D. K. Raiguel S. E. Maes H. Orban G. A. (1995). Processing of kinetically defined boundaries in the cortical motion area MT of the macaque monkey. Journal of Neurophysiology, 74(3), 1258–1270. [PubMed]
Maunsell J. H. R. van Essen D. C. (1983). The connections of the middle temporal visual area (MT) and its relationship to a cortical hierarchy in the macaque monkey. Journal of Neuroscience, 3, 2563–2586. [PubMed]
McKeefry D. J. Watson J. D. G. Frackowiak R. S. J. Fong K. Zeki S. (1997). The activity in human areas V1/V2, V3, and V5 during the perception of coherent and incoherent motion. Neuroimage, 5, 2–12. [CrossRef]
Milner A. D. Goodale M. A. (2008). Two visual systems re-viewed. Neuropsychologia, 46(3), 774–785. [CrossRef]
Miskiewicz A. Buffat S. Paradis A. L. Lorenceau J. (2008). Shape and motion interactions at perceptual and attentional levels during processing of structure from motion stimuli. Journal of Vision, 8(16):17, 1–14, http://www.journalofvision.org/content/8/16/17, doi:10.1167/8.16.17. [PubMed] [Article] [CrossRef] [PubMed]
Movshon J. A. Adelson E. H. Gizzi M. S. Newsome W. T. (1985). The analysis of moving visual pattern. Experimental Brain Research, 11, 117–152.
Mumford D. (1992). On the computational architecture of the neocortex. II. The role of cortico-cortical loops. Biological Cybernetics, 66, 241–251. [CrossRef] [PubMed]
Murray S. O. Kersten D. Olshausen B. A. Schrater P. Woods D. L. (2002). Shape perception reduces activity in human primary visual cortex. Proceedings of the National Academy of Sciences USA, 99, 15164–15169. [CrossRef]
Murray S. O. Olshausen B. A. Woods D. L. (2003). Processing shape, motion and three-dimensional shape-from-motion in the human cortex. Cerebral Cortex, 13(5), 508–516. [CrossRef] [PubMed]
Noest A. J. van Ee R. Nijs M. M. van Wezel R. J. (2007). Percept-choice sequences driven by interrupted ambiguous stimuli: A low-level neural model. Journal of Vision, 7(8):10, 1–14, http://www.journalofvision.org/content/7/8/10, doi:10.1167/7.8.10. [PubMed] [Article] [CrossRef] [PubMed]
Orban G. A. (2008). Higher order visual processing in macaque extrastriate cortex. Physiological Reviews, 88, 59–89. [CrossRef] [PubMed]
Pack C. C. Hunter J. N. Born R. T. (2005). Contrast dependence of suppressive influences in cortical area MT of alert macaque. Journal of Neurophysiology, 93(3), 1809–1815. [PubMed]
Paradis A.-L. Cornilleau-Pérès V. Droulez J. Van de Moortele P.-F. Lobel E. Berthoz A. (2000). Visual perception of motion and 3D structure from motion: an fMRI study. Cerebral Cortex, 10(8), 772–783. [CrossRef] [PubMed]
Paradis A. L. Droulez J. Cornilleau-Peres V. Poline J. B. (2008). Processing 3D form and 3D motion: respective contributions of attention-based and stimulus-driven activity. Neuroimage, 43, 736–747. [CrossRef] [PubMed]
Pastukhov A. Braun J. (2007). Perceptual reversals need no prompting by attention. Journal of Vision, 7(10):5, 1–17, http://www.journalofvision.org/content/7/10/5, doi:10.1167/7.10.5. [PubMed] [Article] [CrossRef] [PubMed]
Qian N. Andersen R. (1994). Transparent motion perception as detection of unbalanced motion signals. II. Physiology. Journal of Neuroscience, 14, 7367–7380. [PubMed]
Ramsden B. M. Hung C. P. Roe A. W. (2001). Real and illusory contour processing in area V1 of the primate: A cortical balancing act. Cerebral Cortex, 11, 648–665. [CrossRef] [PubMed]
Rao R. P. Ballard D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2, 79–87. [CrossRef] [PubMed]
Rees G. Friston K. Koch C. (2000). A direct quantitative relationship between the functional properties of human and macaque V5. Nature Neuroscience, 3, 716–723. [CrossRef] [PubMed]
Rosenblatt F. (1961). Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Washington, D.C.: Spartan Books.
Shiffrar M. Freyd J. J. (1990). Apparent motion of the human body. Psychological Science, 1(4), 257–264. [CrossRef]
Sillito A. M. Cudeiro J. Jones H. E. (2006). Always returning: Feedback and sensory processing in visual cortex and thalamus. Trends in Neuroscience, 29(6), 307–316. [CrossRef]
Sporns O. Tononi G. Kotter R. (2005). The human connectome: A structural description of the human brain. PLoS Computational Biology, 1(4), e42. [CrossRef] [PubMed]
Sterzer P. Eger E. Kleinschmidt A. (2003). Responses of extrastriate cortex to switching perception of ambiguous visual motion stimuli. NeuroReport, 14(18), 2337–2341. [CrossRef] [PubMed]
Sterzer P. Kleinschmidt A. (2007). A neural basis for inference in perceptual ambiguity. Proceedings of the National Academy of Sciences USA, 104, 323–328. [CrossRef]
Tootell R. B. Mendola J. D. Hadjikhani N. K. Ledden P. J. Liu A. K. Reppas J. B. (1997). Functional analysis of V3A and related areas in human visual cortex. Journal of Neuroscience, 17, 7060–7078. [PubMed]
Tootell R. B. Reppas J. B. Kwong K. K. Malach R. Born R. T. Brady T. J. (1995). Functional analysis of human MT and related visual cortical areas using magnetic resonance imaging. Journal of Neuroscience, 15, 3215–3230. [PubMed]
Tsui J. M. Hunter J. N. Born R. T. Pack C. C. (2010). The role of V1 surround suppression in MT motion integration. Journal of Neurophysiology, 103(6), 3123–3138. [CrossRef] [PubMed]
Ungerleider L. G. Mishkin M. (1982). Two cortical visual systems. In: D. J. Ingle, M. A. Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cambridge, MA: MIT press.
Ungerleider L. G. Galkin T. W. Desimone R. Gattass R. (2008). Cortical connections of area V4 in the macaque. Cerebral Cortex, 18(3), 477–499. [CrossRef] [PubMed]
Van Oostende S. Sunaert S. Van Hecke P. Marchal G. Orban G. A. (1997). The kinetic occipital (KO) region in man: An fMRI study. Cerebral Cortex, 7, 690–701. [CrossRef] [PubMed]
Verhoef B.-E. Vogels R. Janssen P. (2010). Contribution of inferior temporal and posterior parietal activity to three-dimensional shape perception. Current Biology, 20(10), 909–913. [CrossRef] [PubMed]
Verhoef B.-E. Vogels R. Janssen P. (2011). Synchronization between the end stages of the dorsal and the ventral visual stream. Journal of Neurophysiology, 105(5), 2030–2042. [CrossRef] [PubMed]
Wallach H. O'Connell D. N. (1953). The kinetic depth effect. Journal of Experimental Psychology, 45, 205–217. [CrossRef] [PubMed]
Walter B. Blecker C. Kirsch P. Sammer G. Schienle A. Stark R. (2003). MARINA: An easy to use tool for the creation of MAsks for Region of INterest Analyses. In: 9th International Conference on Functional Mapping of the Human Brain. New York, USA.
Wang W. Jones H. E. Andolina I. M. Salt T. E. Sillito A. M. (2006). Functional alignment of feedback effects from visual cortex to thalamus. Nature Neuroscience, 9(10), 1330–1336. [CrossRef] [PubMed]
Weiss Y. Adelson E. H. (1996). A unified mixture framework for motion segmentation: Incorporating spatial coherence and estimating the number of models. cvpr, p. 321, 1996 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96).
Wotawa N. Thirion B. Castet E. Anton J.-L. Faugeras O. (2005). Human retinotopic mapping using fMRI. Technical Report 5472, INRIA, Sophia-Antipolis, 65p. http://hal.inria.fr/inria-00070536/en/.
Yesilyurt B. Ugurbil K. Uludag K. (2008). Dynamics and nonlinearities of the BOLD response at very short stimulus durations. Magnetic Resonance Imaging, 26, 853–862. [CrossRef] [PubMed]
Zhang P. Jamison K. Engel S. He B. He S. (2011). Binocular rivalry requires visual attention. Neuron, 71(2), 362–369. [CrossRef] [PubMed]
Zhuo Y. Zhou T. G. Rao H. Y. Wang J. J. Meng M. Chen M. (2003). Contributions of the visual ventral pathway to long-range apparent motion. Science, 299(5605), 417–420. [CrossRef] [PubMed]
Appendix A: latency analysis of BOLD activity
Figure A.1
 
Responses to perceptual transitions toward bound (green) and unbound (red) states in the middle temporal and fusiform ROIs. Time courses on the left are reconstructed with an FIR model and averaged over both hemispheres and both inductions modes (Error bars: ± 1 standard error). Black asterisks on the FIR time courses denote significant differences between the two percepts at a given time point (p < 0.05, Wilcoxon paired-value tests). When averaging the two induction modes, the difference between the two perceptual transitions is significant as early as the first time point in the fusiform ROI. At this first time point, however, the BOLD activity in the middle temporal ROI does not depend on the perceptual transition. The right column displays the time courses in the same ROIs, estimated using a model with a HRF (Hemodynamic Response Function) and its time derivative. This model allows estimating the latency of the peak response on an individual basis (e.g., Henson et al., 2002). For the transitions to the bound percept, using this model, we found an increase of activity in the fusiform for all 12 subjects and a decrease of activity in the middle temporal area for nine out of 12 subjects. For these nine subjects, the response peaked on average 1.1 s (± 0.13, standard error of the mean) earlier in the fusiform than in the middle temporal area with a significant latency difference (Wilcoxon paired value test, p = 0.015, green asterisk). Shown in grey is the time course at the onset of a moving diamond in the control runs, estimated with the same model. The comparison of latencies performed for these onsets does not reveal any significant differences between the two areas (fusiform and middle temporal, p > 0.9), suggesting that the difference of latencies observed in middle temporal (hMT+) and fusiform (pFs) areas for the perceptual transitions cannot be accounted for by local vascular properties. Although we cannot exclude that these differences could be due to slower dynamics for decreasing compared to increasing BOLD activity, recent results rather suggest the opposite (see Gardner et al., 2005, and Yesilyurt & Uludag, 2009), which altogether suggests that the later peak latency in hMT+ reveals precedence of pFs processing in form/motion binding.
Figure A.1
 
Responses to perceptual transitions toward bound (green) and unbound (red) states in the middle temporal and fusiform ROIs. Time courses on the left are reconstructed with an FIR model and averaged over both hemispheres and both inductions modes (Error bars: ± 1 standard error). Black asterisks on the FIR time courses denote significant differences between the two percepts at a given time point (p < 0.05, Wilcoxon paired-value tests). When averaging the two induction modes, the difference between the two perceptual transitions is significant as early as the first time point in the fusiform ROI. At this first time point, however, the BOLD activity in the middle temporal ROI does not depend on the perceptual transition. The right column displays the time courses in the same ROIs, estimated using a model with a HRF (Hemodynamic Response Function) and its time derivative. This model allows estimating the latency of the peak response on an individual basis (e.g., Henson et al., 2002). For the transitions to the bound percept, using this model, we found an increase of activity in the fusiform for all 12 subjects and a decrease of activity in the middle temporal area for nine out of 12 subjects. For these nine subjects, the response peaked on average 1.1 s (± 0.13, standard error of the mean) earlier in the fusiform than in the middle temporal area with a significant latency difference (Wilcoxon paired value test, p = 0.015, green asterisk). Shown in grey is the time course at the onset of a moving diamond in the control runs, estimated with the same model. The comparison of latencies performed for these onsets does not reveal any significant differences between the two areas (fusiform and middle temporal, p > 0.9), suggesting that the difference of latencies observed in middle temporal (hMT+) and fusiform (pFs) areas for the perceptual transitions cannot be accounted for by local vascular properties. Although we cannot exclude that these differences could be due to slower dynamics for decreasing compared to increasing BOLD activity, recent results rather suggest the opposite (see Gardner et al., 2005, and Yesilyurt & Uludag, 2009), which altogether suggests that the later peak latency in hMT+ reveals precedence of pFs processing in form/motion binding.
Appendix B: activities in ventral and dorsal visual areas
Figure B.1
 
fMRI contrasts yielding activity in ventral areas (z coordinate from −20 to −4). No significant activity is found above z = 0 for contrasts a and b. BOLD activity thresholds: p = 0.01 at the voxel level and p = 0.05 (uncorrected) at the cluster level. Color bars on the right indicate T values.
Figure B.1
 
fMRI contrasts yielding activity in ventral areas (z coordinate from −20 to −4). No significant activity is found above z = 0 for contrasts a and b. BOLD activity thresholds: p = 0.01 at the voxel level and p = 0.05 (uncorrected) at the cluster level. Color bars on the right indicate T values.
Figure B.2
 
fMRI contrasts yielding activity in dorsal areas (z coordinate from −4 to 12). No significant activity is found below z = −4 for contrasts a and b. The slice at coordinate z = −4 is common with Figure B.1.
Figure B.2
 
fMRI contrasts yielding activity in dorsal areas (z coordinate from −4 to 12). No significant activity is found below z = −4 for contrasts a and b. The slice at coordinate z = −4 is common with Figure B.1.
Appendix C: feature-dependent activity
Figure C.1
 
Feature-dependent activity in the ROIs defined from localizer and control runs (A), or from the main contrasts (B). The ROIs are the same as in Figure 3. The plots represent median values over subjects and interquartile intervals (error bars) of the average amplitude of sustained responses to all perceptual transitions (toward bound and toward unbound percepts). As we were interested in the effects of the physical modulation of Contrast, Motion, and Shape, only the Betas for the evoked transitions are presented. Asterisks indicate significant differences between conditions (Friedman test, p < 0.05) in post-hoc comparisons.
Figure C.1
 
Feature-dependent activity in the ROIs defined from localizer and control runs (A), or from the main contrasts (B). The ROIs are the same as in Figure 3. The plots represent median values over subjects and interquartile intervals (error bars) of the average amplitude of sustained responses to all perceptual transitions (toward bound and toward unbound percepts). As we were interested in the effects of the physical modulation of Contrast, Motion, and Shape, only the Betas for the evoked transitions are presented. Asterisks indicate significant differences between conditions (Friedman test, p < 0.05) in post-hoc comparisons.
Figure 1
 
Experimental conditions. (a) Bistable perception of motion displays (see Movie 1). Displays were alternately seen as unbound line segments translating up and down independently (left) or as bound line segments making up a rigid diamond translating along a circular path (right). Bistability was either generated endogenously (see e) by the brain or induced by physically modulating one of three visual parameters (as depicted in b, c, and d). (b) “Contrast” display. Global shape motion is seen when segments have low-contrast line ends and a high-contrast center (right), whereas reversing the contrast distribution entails the perception of independently moving segments (left). (c) “Motion” display (see Movie 2). The diamond shape is defined by alignments of high-luminance dots. Adding a motion jitter along the segment orientation to each dot (right) yields the perception of a rotating diamond shape, which breaks into independent contour translations for a small jitter or fixed intervals between neighboring dots (left).
 
(d) “Shape” display (see Movie 3). Contours defining a closed convex shape, as a diamond, are easily bound into a single rotating object, while contours defining an open concave shape, such as a chevron, are not. To ease perceptual alternations with this stimulus, static masks covering the vertices at all time during the movement were added. (e) Two induction modes. In half of the displays, perceptual transitions between the two percepts were exogenously evoked by slowly varying one of the parameters described in b, c, and d between two extreme values, as depicted by the red/green ellipse. In the other three displays, the value of the critical parameter (contrast, motion, or shape) was fixed and chosen on the basis of behavioral experiments to elicit spontaneous perceptual alternations between bound and unbound percepts, as depicted by the red/green bottom square.
Figure 1
 
Experimental conditions. (a) Bistable perception of motion displays (see Movie 1). Displays were alternately seen as unbound line segments translating up and down independently (left) or as bound line segments making up a rigid diamond translating along a circular path (right). Bistability was either generated endogenously (see e) by the brain or induced by physically modulating one of three visual parameters (as depicted in b, c, and d). (b) “Contrast” display. Global shape motion is seen when segments have low-contrast line ends and a high-contrast center (right), whereas reversing the contrast distribution entails the perception of independently moving segments (left). (c) “Motion” display (see Movie 2). The diamond shape is defined by alignments of high-luminance dots. Adding a motion jitter along the segment orientation to each dot (right) yields the perception of a rotating diamond shape, which breaks into independent contour translations for a small jitter or fixed intervals between neighboring dots (left).
 
(d) “Shape” display (see Movie 3). Contours defining a closed convex shape, as a diamond, are easily bound into a single rotating object, while contours defining an open concave shape, such as a chevron, are not. To ease perceptual alternations with this stimulus, static masks covering the vertices at all time during the movement were added. (e) Two induction modes. In half of the displays, perceptual transitions between the two percepts were exogenously evoked by slowly varying one of the parameters described in b, c, and d between two extreme values, as depicted by the red/green ellipse. In the other three displays, the value of the critical parameter (contrast, motion, or shape) was fixed and chosen on the basis of behavioral experiments to elicit spontaneous perceptual alternations between bound and unbound percepts, as depicted by the red/green bottom square.
Figure 2
 
Behavioral data in the scanner for evoked (blue) and spontaneous (orange) transitions. Error bars: ± 1 standard error. (a) Number of perceptual transitions for the six motion displays. (b) Percent time spent in the bound percept for each stimulus Condition. (c) Durations of perceptual episodes for the Evoked (•) and Spontaneous (▪) transitions as a function of their rank. Evoked alternations entail percept durations commensurate with the period of the physical modulations. With spontaneous alternations, episode duration varies with its rank, with longer durations for the percept seen at stimulus onset. This first perceptual state—bound or unbound—depended on the observers. (d) Normalized distributions of episode durations for both induction modes. This distribution followed a log-normal distribution only in the spontaneous conditions.
Figure 2
 
Behavioral data in the scanner for evoked (blue) and spontaneous (orange) transitions. Error bars: ± 1 standard error. (a) Number of perceptual transitions for the six motion displays. (b) Percent time spent in the bound percept for each stimulus Condition. (c) Durations of perceptual episodes for the Evoked (•) and Spontaneous (▪) transitions as a function of their rank. Evoked alternations entail percept durations commensurate with the period of the physical modulations. With spontaneous alternations, episode duration varies with its rank, with longer durations for the percept seen at stimulus onset. This first perceptual state—bound or unbound—depended on the observers. (d) Normalized distributions of episode durations for both induction modes. This distribution followed a log-normal distribution only in the spontaneous conditions.
Figure 3
 
Regions of interest (ROI) and their patterns of response to perceptual transitions toward bound (green) and unbound (red) percepts. ROIs are represented overlaid on the average anatomy of 12 subjects. Time courses are reconstructed with a FIR model and averaged over both hemispheres (Error bars: ± 1 standard error). For the ROIs showing a significant difference between the two percepts, the gray line indicates the earliest time point at which the bound–unbound difference achieves significance (uncorrected p < 0.05; note that the same time point was obtained using either a Wilcoxon paired-value test or a paired t-test). (a) ROIs defined from localizer and control runs; hMT, human MT; pFs, posterior fusiform; LO, Lateral Occipital; V1, primary visual area. The light blue circles delineate the hMT, LO, and pFs ROIs as they are defined in the study, i.e., 1-cm diameter spheres around the maximal shape and motion-related activity in the localizer runs (coordinates in Table 1). For visualization purposes, motion- and shape-related activities are represented with the same cluster-level threshold (p < 0.05, FWE-corrected for the whole brain volume) but different voxel-level thresholds (p < 0.0001 for hMT; p < 0.01 for LO and pFs). For V1, the thresholds used for visualization are the same as those used to define the ROI from the activity related to the full diamond in the control run (p < 0.03 uncorrected at the voxel-level). (b) ROIs defined from the main contrasts. The thresholds used for visualization are the same as those used to define the ROIs (voxel-level, p < 0.01; cluster-level, p < 0.05, FWE-corrected or 40 voxels, see Methods). For comparison purpose, the “Middle temporal,” “Fusiform,” and “Calcarine” areas are represented overlaid on the same anatomical slices as “hMT,” “pFs,” and “V1,” respectively, “hMT” and “pFs” being reported with dotted circles as in a.
Figure 3
 
Regions of interest (ROI) and their patterns of response to perceptual transitions toward bound (green) and unbound (red) percepts. ROIs are represented overlaid on the average anatomy of 12 subjects. Time courses are reconstructed with a FIR model and averaged over both hemispheres (Error bars: ± 1 standard error). For the ROIs showing a significant difference between the two percepts, the gray line indicates the earliest time point at which the bound–unbound difference achieves significance (uncorrected p < 0.05; note that the same time point was obtained using either a Wilcoxon paired-value test or a paired t-test). (a) ROIs defined from localizer and control runs; hMT, human MT; pFs, posterior fusiform; LO, Lateral Occipital; V1, primary visual area. The light blue circles delineate the hMT, LO, and pFs ROIs as they are defined in the study, i.e., 1-cm diameter spheres around the maximal shape and motion-related activity in the localizer runs (coordinates in Table 1). For visualization purposes, motion- and shape-related activities are represented with the same cluster-level threshold (p < 0.05, FWE-corrected for the whole brain volume) but different voxel-level thresholds (p < 0.0001 for hMT; p < 0.01 for LO and pFs). For V1, the thresholds used for visualization are the same as those used to define the ROI from the activity related to the full diamond in the control run (p < 0.03 uncorrected at the voxel-level). (b) ROIs defined from the main contrasts. The thresholds used for visualization are the same as those used to define the ROIs (voxel-level, p < 0.01; cluster-level, p < 0.05, FWE-corrected or 40 voxels, see Methods). For comparison purpose, the “Middle temporal,” “Fusiform,” and “Calcarine” areas are represented overlaid on the same anatomical slices as “hMT,” “pFs,” and “V1,” respectively, “hMT” and “pFs” being reported with dotted circles as in a.
Figure A.1
 
Responses to perceptual transitions toward bound (green) and unbound (red) states in the middle temporal and fusiform ROIs. Time courses on the left are reconstructed with an FIR model and averaged over both hemispheres and both inductions modes (Error bars: ± 1 standard error). Black asterisks on the FIR time courses denote significant differences between the two percepts at a given time point (p < 0.05, Wilcoxon paired-value tests). When averaging the two induction modes, the difference between the two perceptual transitions is significant as early as the first time point in the fusiform ROI. At this first time point, however, the BOLD activity in the middle temporal ROI does not depend on the perceptual transition. The right column displays the time courses in the same ROIs, estimated using a model with a HRF (Hemodynamic Response Function) and its time derivative. This model allows estimating the latency of the peak response on an individual basis (e.g., Henson et al., 2002). For the transitions to the bound percept, using this model, we found an increase of activity in the fusiform for all 12 subjects and a decrease of activity in the middle temporal area for nine out of 12 subjects. For these nine subjects, the response peaked on average 1.1 s (± 0.13, standard error of the mean) earlier in the fusiform than in the middle temporal area with a significant latency difference (Wilcoxon paired value test, p = 0.015, green asterisk). Shown in grey is the time course at the onset of a moving diamond in the control runs, estimated with the same model. The comparison of latencies performed for these onsets does not reveal any significant differences between the two areas (fusiform and middle temporal, p > 0.9), suggesting that the difference of latencies observed in middle temporal (hMT+) and fusiform (pFs) areas for the perceptual transitions cannot be accounted for by local vascular properties. Although we cannot exclude that these differences could be due to slower dynamics for decreasing compared to increasing BOLD activity, recent results rather suggest the opposite (see Gardner et al., 2005, and Yesilyurt & Uludag, 2009), which altogether suggests that the later peak latency in hMT+ reveals precedence of pFs processing in form/motion binding.
Figure A.1
 
Responses to perceptual transitions toward bound (green) and unbound (red) states in the middle temporal and fusiform ROIs. Time courses on the left are reconstructed with an FIR model and averaged over both hemispheres and both inductions modes (Error bars: ± 1 standard error). Black asterisks on the FIR time courses denote significant differences between the two percepts at a given time point (p < 0.05, Wilcoxon paired-value tests). When averaging the two induction modes, the difference between the two perceptual transitions is significant as early as the first time point in the fusiform ROI. At this first time point, however, the BOLD activity in the middle temporal ROI does not depend on the perceptual transition. The right column displays the time courses in the same ROIs, estimated using a model with a HRF (Hemodynamic Response Function) and its time derivative. This model allows estimating the latency of the peak response on an individual basis (e.g., Henson et al., 2002). For the transitions to the bound percept, using this model, we found an increase of activity in the fusiform for all 12 subjects and a decrease of activity in the middle temporal area for nine out of 12 subjects. For these nine subjects, the response peaked on average 1.1 s (± 0.13, standard error of the mean) earlier in the fusiform than in the middle temporal area with a significant latency difference (Wilcoxon paired value test, p = 0.015, green asterisk). Shown in grey is the time course at the onset of a moving diamond in the control runs, estimated with the same model. The comparison of latencies performed for these onsets does not reveal any significant differences between the two areas (fusiform and middle temporal, p > 0.9), suggesting that the difference of latencies observed in middle temporal (hMT+) and fusiform (pFs) areas for the perceptual transitions cannot be accounted for by local vascular properties. Although we cannot exclude that these differences could be due to slower dynamics for decreasing compared to increasing BOLD activity, recent results rather suggest the opposite (see Gardner et al., 2005, and Yesilyurt & Uludag, 2009), which altogether suggests that the later peak latency in hMT+ reveals precedence of pFs processing in form/motion binding.
Figure B.1
 
fMRI contrasts yielding activity in ventral areas (z coordinate from −20 to −4). No significant activity is found above z = 0 for contrasts a and b. BOLD activity thresholds: p = 0.01 at the voxel level and p = 0.05 (uncorrected) at the cluster level. Color bars on the right indicate T values.
Figure B.1
 
fMRI contrasts yielding activity in ventral areas (z coordinate from −20 to −4). No significant activity is found above z = 0 for contrasts a and b. BOLD activity thresholds: p = 0.01 at the voxel level and p = 0.05 (uncorrected) at the cluster level. Color bars on the right indicate T values.
Figure B.2
 
fMRI contrasts yielding activity in dorsal areas (z coordinate from −4 to 12). No significant activity is found below z = −4 for contrasts a and b. The slice at coordinate z = −4 is common with Figure B.1.
Figure B.2
 
fMRI contrasts yielding activity in dorsal areas (z coordinate from −4 to 12). No significant activity is found below z = −4 for contrasts a and b. The slice at coordinate z = −4 is common with Figure B.1.
Figure C.1
 
Feature-dependent activity in the ROIs defined from localizer and control runs (A), or from the main contrasts (B). The ROIs are the same as in Figure 3. The plots represent median values over subjects and interquartile intervals (error bars) of the average amplitude of sustained responses to all perceptual transitions (toward bound and toward unbound percepts). As we were interested in the effects of the physical modulation of Contrast, Motion, and Shape, only the Betas for the evoked transitions are presented. Asterisks indicate significant differences between conditions (Friedman test, p < 0.05) in post-hoc comparisons.
Figure C.1
 
Feature-dependent activity in the ROIs defined from localizer and control runs (A), or from the main contrasts (B). The ROIs are the same as in Figure 3. The plots represent median values over subjects and interquartile intervals (error bars) of the average amplitude of sustained responses to all perceptual transitions (toward bound and toward unbound percepts). As we were interested in the effects of the physical modulation of Contrast, Motion, and Shape, only the Betas for the evoked transitions are presented. Asterisks indicate significant differences between conditions (Friedman test, p < 0.05) in post-hoc comparisons.
Table 1
 
Coordinates of the voxels of maximal activation in the localizer runs. For the ventral and dorsal areas, the temporal courses of activations during the two percepts were studied in 1-cm diameter ROIs centered on these voxels (Figure 3A).
Table 1
 
Coordinates of the voxels of maximal activation in the localizer runs. For the ventral and dorsal areas, the temporal courses of activations during the two percepts were studied in 1-cm diameter ROIs centered on these voxels (Figure 3A).
Processing stream Contrast ROI label MNI coordinates: left hemisphere MNI coordinates: right hemisphere
x y z x y z
V1 Control run (Full diamond > fixation) V1 −6 −102 0 4 −96 −6
Ventral areas Shape areas localizer (Intact > Scrambled) pFs −44 −62 −18 40 −60 −18
LO −36 −82 −6 42 −76 −4
Dorsal areas Motion areas localizer (Motion > Static) hMT −42 −70 6 42 −62 6
Table 2
 
Clusters exhibiting significant differential responses between the bound and unbound percepts. Clusters are segregated depending on the part of the response (transient or sustained) that shows a significant difference (voxel level: p < 0.01 uncorrected; cluster level: p < 0.05, FWE-corrected for the whole brain analysis, or at least 40 voxels for activities in early visual areas, see Methods). The clusters that were selected for further ROI-based analysis are highlighted in italic. For each cluster, we report the MNI coordinates and statistics of the maximally activated voxel. When a cluster encompasses several anatomical regions, additional local maxima of activity are detailed. The extent of the clusters and the associated p-value are indicated for the clusters surviving the conservative thresholding of p < 0.05, FWE-corrected (whole brain analysis). For the other clusters, corresponding to an analysis restricted to occipital and posterior temporal regions performed to uncover activation in low-order visual areas, only the extent of the clusters is indicated. L/R = Left/Right; POS = Parieto-Occipital Sulcus.
Table 2
 
Clusters exhibiting significant differential responses between the bound and unbound percepts. Clusters are segregated depending on the part of the response (transient or sustained) that shows a significant difference (voxel level: p < 0.01 uncorrected; cluster level: p < 0.05, FWE-corrected for the whole brain analysis, or at least 40 voxels for activities in early visual areas, see Methods). The clusters that were selected for further ROI-based analysis are highlighted in italic. For each cluster, we report the MNI coordinates and statistics of the maximally activated voxel. When a cluster encompasses several anatomical regions, additional local maxima of activity are detailed. The extent of the clusters and the associated p-value are indicated for the clusters surviving the conservative thresholding of p < 0.05, FWE-corrected (whole brain analysis). For the other clusters, corresponding to an analysis restricted to occipital and posterior temporal regions performed to uncover activation in low-order visual areas, only the extent of the clusters is indicated. L/R = Left/Right; POS = Parieto-Occipital Sulcus.
Cluster label Maximum voxel Cluster extent
MNI coordinates t(11) p Volume (voxels) FWE-corr p
x y z
Bound > Unbound, transient activity
L fusiform and calcarine −30 −18 −68 −90 −22 0 8.92 5.30 <0.001 <0.001 1105 <0.001
R fusiform 48 −66 −16 4.24 0.001 291
R calcarine 8 −90 −4 5.57 <0.001 250
Bound > Unbound, sustained activity
L fusiform −30 −60 −22 6.00 <0.001 412 0.029
R fusiform and posterior occipital (lower) 38 34 −72 −88 −12 −2 5.59 4.64 <0.001 <0.001 507 0.009
R posterior occipital (upper) 26 −96 20 6.10 <0.001 45
L posterior occipital (lower and upper) −26 −26 −100 −92 −2 20 7.76 5.30 <0.001 <0.001 192
Unbound > Bound, transient activity
L Middle Temporal −54 −70 12 9.64 <0.001 626 0.002
R Middle Temporal 52 −74 10 7.42 <0.001 598 0.002
R Precuneus 12 −52 8 3.87 0.001 88
R Cuneus 18 −82 40 3.86 0.001 74
R POS 18 −56 18 3.64 0.002 73
Unbound > Bound, sustained activity
R Middle Temporal 48 −68 4 10.36 <0.001 772 <0.001
L Middle Temporal −46 −68 8 4.07 0.001 136
R POS 4 −66 14 3.70 0.002 105
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×