Abstract
We used the method of image classification to investigate the spatiotemporal mechanisms for simple feature perception in normal and amblyopic vision. In the first experiment, we estimated the spatiotemporal mechanism for detecting a luminance increment of a bright bar embedded in spatiotemporal noise for normal and amblyopic observers. The normal template is characterized by a temporal summation zone surrounded by symmetric spatial inhibition zones and followed by a temporal inhibition zone. The abnormal amblyopic template lacks inhibition but shows normal temporal summation. Neither blurring the stimuli in space nor varying the target signal-to-noise ratios caused any significant change in the normal or amblyopic template. However, decreasing the fundamental frequency of the stimuli restored the normal template in the amblyopic eye. Furthermore, the normal periphery demonstrates a spatiotemporal template similar to that of the amblyopic eye. The second experiment mapped the spatiotemporal interaction dynamics of flankers in an orientation discrimination task. For normal peripheral vision, assimilation of the perceived target orientation into the flanker orientation (i.e., crowding) occurs at flanker locations adjacent to the target in space and around or before the target presentation in time. Cueing the target may influence the spatiotemporal interaction map, but the strongest crowding never coincides with the target presentation. The amblyopic spatiotemporal interaction map is similar to that of the normal periphery, except that in general the spatiotemporal interaction is more widely distributed around and after the target presentation. Distant flankers induced “anti-crowding” (repulsion) of perceived orientation in both normal and amblyopic vision.
Introduction
Amblyopia is a developmental visual disorder with primarily reduced contrast sensitivity and other impaired visual functions (Ciuffreda, Levi, & Selenow,
1991; Levi,
1991; Levi, Klein, & Aitsebaomo,
1985). Crowding is closely related to many amblyopic visual deficits such as contour interaction, spatial distortion, and reduced Vernier acuity (Bedell & Flom,
1981; Hess & Jacobs,
1979; Levi & Klein,
1985; Levi, Yu, Kuai, & Rislove,
2007), as well as impaired daily visual functions such as reading (Levi, Song, & Pelli,
2007). It has been proposed that strabismic central vision can be modeled as the normal peripheral vision, and crowding is one of the most important shared characteristics (Levi,
1991; Levi & Klein,
1985; Levi et al.,
1985).
The temporal properties of amblyopic vision have not received nearly as much attention as the spatial properties. Among the few studies on amblyopic temporal visual deficits, an abnormal spatiotemporal modulation function with reduced contrast sensitivity more markedly at low temporal frequencies and at medium to high spatial frequencies has been found (Bradley & Freeman,
1985; Levi & Harwerth,
1977; Manny & Levi,
1982a,
1982b). Interestingly, this defect of sensitivity at low temporal frequencies was much milder or absent for apparent contrast tasks using suprathreshold gratings (Manny & Levi,
1982a,
1982b), suggesting that the mechanism for detecting flicker at near-threshold contrast may differ from that at suprathreshold contrast. Moreover, at low spatial frequencies, little difference in temporal sensitivity has been noted between amblyopic and normal eyes (Bradley & Freeman,
1985; Manny & Levi,
1982a,
1982b). These studies suggest that the amblyopic spatiotemporal mechanism for detecting a simple image feature is abnormal—the spatial mechanism may be coarser with decreased resolution, and given the extant data, we can at least expect that the temporal mechanism differs from that of the normal at high spatial frequencies. On the other hand, the amblyopic spatiotemporal template for identifying a crowded feature would be similar to that of the normal peripheral vision if both share the same mechanism for spatial interactions.
In the present study, we used the method of image classification, which is essentially a correlation analysis of the observer's decision with the random noise present in the stimuli. The purpose of this method is to estimate the detection template (i.e., strategy or mechanism) for performing a specific task for an observer (Ahumada,
1996; Ahumada & Lovell,
1971; Beard & Ahumada,
1998; Gold, Murray, Bennett, & Sekuler,
2000; Levi & Klein,
2002; Nandy & Tjan,
2007; Neri & Heeger,
2002; Neri, Parker, & Blakemore,
1999; Watson & Rosenholtz,
1997). The classification image technique is closely related to the reverse correlation technique—a tool for neural receptive field analysis that was developed in the 1980s and 1990s (Jones & Palmer,
1987; Ohzawa, DeAngelis, & Freeman,
1997; Ringach, Hawken, & Shapley,
1997), and they have both been mathematically justified (Ahumada,
2002; Bussgang,
1952; de Boer,
1967; de Boer & Kuyper,
1968; Neri & Levi,
2006; Ringach & Shapley,
2004). Classification image methods have also been applied in amblyopic vision studies. Levi et al. found that in amblyopia the templates for target detection and position discrimination were shifted to lower spatial frequencies, especially the latter that was substantially impaired at higher spatial frequencies. These abnormal templates as well as the high internal noise together account for the loss of performance for amblyopic vision (Levi & Klein,
2002,
2003; Levi, Klein, & Chen,
2008). Interestingly, there is evidence that the abnormal template for position discrimination can at least partly be corrected after training through perceptual learning (Li, Klein, & Levi,
2008).
In the present study, we applied spatiotemporal noise (either luminance or orientation noise) to the corresponding visual feature and estimated the two-dimensional spatiotemporal classification images for both normal and amblyopic observers.
Experiment 1: The spatiotemporal mechanism for luminance discrimination
In the first experiment, we use a variant of the method described by Neri and Heeger (
2002) to estimate the spatiotemporal mechanism for luminance discrimination. Their study directly demonstrated this spatiotemporal mechanism for feature detection for the first time, and the resulting classification image resembled the spatiotemporal receptive field of a V1 neuron to some degree (DeAngelis, Ohzawa, & Freeman,
1993; Neri & Levi,
2006). The spatial profile of the classification image has a “Mexican hat” shape typically seen in both the neural receptive field and the behavioral “perceptive field” (DeAngelis et al.,
1993; Hubel & Wiesel,
1959). However, the temporal profile of the classification image is rather widely distributed, and the often observed temporal inhibition (or transience) in detecting asynchronous light stimuli was absent (Kelly,
1971a,
1971b; Rashbass,
1970; Saarela & Herzog,
2008; Watson & Nachmias,
1977). A probable explanation for the purely sustained temporal profile of the classification image is the high temporal uncertainty associated with the task. High uncertainty is not only inefficient in estimating the template but can also severely diminish the value of the result. It has been shown that for classification images, a relatively strong signal may reduce perceptual uncertainty and provide the task-relevant template (the “signal clamped method”—Tjan & Nandy,
2006). In the present study, we repeated Neri and Heeger's main experiments but reduce the temporal uncertainty by using a discrimination (instead of detection) task, in which the target signal was an increment on a relatively high pedestal that can be reliably localized temporally and spatially.
Methods
Observers
Five observers with normal vision and six with amblyopia (2 purely anisometropic and 4 with strabismus) participated in our study. All participants had been exposed to hours of visual psychophysical experiments and were naïve as to the purpose of the study, except for the author. None of the amblyopic observers has eccentric fixation. One amblyope, JS, had received intensive training with her amblyopic eye through perceptual learning, and her visual acuity as well as other visual functions of the amblyopic eye had improved substantially by the time she entered the current study. The detailed characteristics of each observer are listed in
Table 1.
Table 1 Observers' characteristics.
Table 1 Observers' characteristics.
Observer | Age (years) | Strabismus (at 6 m) | Eye | Refractive error (diopters, D) | Line letter VA (single letter VA) |
Normal |
SS | 30 | None | R | −0.25 | 20/12.5 |
| | L | pl/−0.25 × 119 | 20/12.5 |
SK | 19 | None | R | −4.75/−1.00 × 165 | 20/16 |
| | L | −3.00 | 20/16 |
CN | 23 | None | R | −0.50 | 20/12.5+1 |
| | L | −0.50 | 20/12.5+1 |
EP | 25 | None | R | +0.75 | 20/16+1 |
| | L | +0.75 | 20/16+2 |
MN | 36 | None | R | −1.25/−1.50 × 106 | 20/12.5 |
| | L | −1.75/−1.00 × 73 | 20/12.5 |
Strabismic (including strabismic and anisometropic) amblyopic |
BK | 62 | L ExoT 8Δ | R (NAE) | −1.50/−2.5 × 105 | 20/12.5+1 |
| R hyperT 6∼8Δ | L (AE) | −3.00/−0.25 × 135 | 20/25+1 (20/20−2) |
BN | 22 | L EsoT 3∼4Δ | R (NAE) | +5.50/−2.25 × 5 | 20/16+2 |
| | L (AE) | +5.50/−1.50 × 175 | 20/50−2 (20/25−2) |
GW | 59 | R EsoT 4–6Δ | R (AE) | pl | 20/100−2 (20/32−1) |
| | L (NAE) | +0.50/−0.75 × 180 | 20/16−1 |
GJ | 24 | R EsoT 4–5Δ | R (AE) | +3.50/−1.00 × 97 | 20/40−2 (20/25+1) |
| | L (NAE) | pl | 20/16−1 |
Anisometropic amblyopic |
GD | 45 | None | R | +0.25/−0.50 × 90 | 20/12.5−2 |
| | L | +3.75/−1.00 × 30 | 20/50+2 (20/50+2) |
JS | 27 | None | R (AE) | +1.00 | Pre-training: 20/25−2 (20/25+2) |
| | | | Post-training: 20/16+1 (20/16+1) |
| | L (NAE) | +0.25 | 20/12.5−1 |
Stimuli
Stimuli were generated by Matlab 7.01 with Psychtoolbox 2.0 and presented on a flat CRT monitor (39.6 × 29.7 cm, 1600 × 1400 pixels, 100 Hz, NEC). At 50% brightness and 100% contrast, the luminance gradient ranges from 0.32 to 117.7 cd/m2. The monitor was calibrated to generate a linear luminance function of pixel value. There were six conditions in this experiment.
Condition 1 (basic noise–signal). The background luminance was 58 cd/m
2. The basic stimulus element was a 1.1 × 0.8 deg square composed of 11 vertical bars in juxtaposition. Each bar measured 0.1 × 0.8 deg and was randomly assigned one of 11 luminance noise increments (from −6.9 cd/m
2 to 6.9 cd/m
2 with step size of 1.38 cd/m
2). A complete stimulus consisted of a 0.42-s sequence of 21 such elements (i.e., frames), each of which lasted 20 ms. The middle (6th) bar in the middle (11th) frame of the sequence was the target, and the rest were noise (i.e., distracters). The target luminance on each trial was randomly selected from one of two values, 94.7 cd/m
2 (high) and 86.4 cd/m
2 (low), and was noise free. It was separately determined for each amblyopic observer according to his/her visual function (see Preliminary experiments). Two red bars, 1/2 the length and 1/3 the width of the target bar, abutting the middle bar of each frame from above and below, served as the spatial cue for the target and were present throughout a trial. These
noise–signal sessions served as the basic condition for both normal and amblyopic observers (
Figure 1).
Condition 2 (basic noise–noise). The luminance of the target bar was replaced by a fixed luminance pedestal (i.e., 86.4 cd/m2) plus a noise luminance increment randomly selected in the same way as for any other distracters from the 11 possible values. In this case, the signal was “imaginary” but the target could still be spatially and temporally localized because of its relatively high base luminance. The rest of a noise–noise session was the same as the noise–signal sessions described above. This condition was also part of the basic setup for other conditions.
Condition 3. For two normal observers, SS and SK, blurred stimuli were generated by spatially convolving each original frame with a two-dimensional disk filter with the radius equal to half of the bar width. The luminance levels of different components of the stimuli were varied in order to maintain the performance.
Condition 4. For one normal observer SK, target signal-to-noise ratios of the stimuli were decreased by increasing the noise luminance of the distracters. The step size of the luminance increment of the distracters was elevated from 1.38 cd/m2 to 2.3 cd/m2. Here, the target signal-to-noise ratios are defined to consist of both the ratio of the target luminance pedestal to the noise luminance step size and the ratio of the target luminance increment to the noise luminance step size.
Condition 5. The target signal-to-noise ratios of the stimuli was increased by either increasing both the target luminance pedestal and the target luminance increment, or reducing the noise (i.e., decreasing the absolute values of the noise luminance) or both. When presented in the normal periphery, the target luminance values were 108.6 cd/m2 (high) and 78.3 cd/m2 (low), and the range of noise luminance increments was reduced from [−6.9, 6.9] cd/m2 to [−4.6, 4.6] cd/m2 with the decreased step size of 0.92 cd/m2. The background luminance was lowered to 46.0 cd/m2. For three amblyopic observers, BK, GW, and GD, the amount of modification varied slightly depending on individual performance.
Condition 6. In addition to the increased target signal-to-noise ratios either for amblyopic eyes in the fovea or for normal eyes in the periphery, the bar width was increased from 0.1 deg to 0.3 deg for both the bright target and the noise, hence the fundamental frequency of the stimulus was decreased from 5 cpd to 1.67 cpd.
Procedures
All experiments were done monocularly in a room with dim overhead light (10 cd/m2). The viewing distance was 1 m. The stimuli were presented to the fovea of all normal and amblyopic observers. For some normal observers, the task was also performed at 2.5 deg in the inferior visual field, in which case a fixation cross (20 × 20 min arc) was presented to the fovea.
Preliminary experiments. The luminance increment (i.e., noise step size) was set at the threshold for detecting a single bar on a blank screen. More specifically, a 40-trial session was completed by each observer. In every single interval trial, either a target bar or a blank screen was presented for 20 ms, and the observer had to respond whether a bar was presented or not. The luminance of the target bar was determined by Quest with the criterion set at 75%.
The lower target luminance in Condition 1 was set to an easily visible level while the rest of the stimulus square was filled with the luminance of the brightest noise for normal observers. In a preliminary session of 40 trials, a single stimulus frame uniformly filled with the brightest noise luminance, i.e., the sum of the background luminance and the largest noise increment (among the 11 noise increments), was presented in both intervals of each 2-interval forced choice (2-IFC) trial. In one of the two intervals, a brighter target bar was presented in the middle of the square, and its luminance is determined by Quest with a 95% criterion to ensure that it was easily detected. The observer had to respond as to which interval contains the target bar. The mean threshold luminance for all normal observers or the threshold luminance of individual amblyopic observer was taken as the low target luminance for the basic experimental conditions. The higher target luminance was set at around 75% threshold of discriminating a bright bar from the less bright bar of the fixed lower luminance described above, for the basic experimental conditions. A similar 2-IFC session of 40 trials was completed by every observer. A bar at the target location with the fixed lower luminance was presented in one interval, and one with a higher luminance was presented in the other interval. The observer had to respond by indicating which interval contained the brighter bar. The higher luminance is a variable and was determined by Quest with a 75% criterion for each trial. The mean threshold for all normal observers or the threshold luminance for individual amblyopic observer was taken as the high target luminance for the basic experimental conditions. Note that the luminance levels of the stimuli determined by methods described above may not generate the same performance with the presence of spatiotemporal noise as in the experimental sessions.
Tasks. At the beginning of each session, target bars with high and low luminances were presented side by side and the observer was instructed to memorize the luminance difference. Then, there was a 15-trial training session with feedback. In each training trial, either a bright or a less bright target bar, but no noise, was presented, and the observer was asked to respond whether it was a bright or less bright bar by pressing the appropriate key. Audio feedback was given.
The short training session was followed immediately by an experimental session. In each trial, the observer had to respond whether the target bar was bright or less bright by pressing a key (yes/no). No feedback was given. We used this discrimination task instead of a detection task in an attempt to avoid high temporal and spatial uncertainty, especially the former, which is often associated with such experiments and can severely contaminate the results. Each session had 200 trials and only one of the five conditions described above. A total of at least 2000 trials were collected for each condition for each eye.
Data analysis. We used reverse correlation to compute the classification images. The stimulus sequence of each trial is characterized by a 21 (time) × 11 (space) noise matrix and a 1–0 (1 for high and 0 for low) target luminance. The noise at a given time and space has a uniform distribution over all eleven luminance noise levels. The response for each trial is either 1 (i.e., bright bar) or 0 (i.e., less bright bar). All noise matrices and responses are grouped and averaged into 4 categories (
Table 2).
Table 2 Categories of stimuli and responses in computing classification images.
Table 2 Categories of stimuli and responses in computing classification images.
Response | Stimulus |
1 | 0 |
1 | Hit | S1R1 | False alarm | S0R1 |
μ [1,1] | μ [0,1] |
σ [1,1] | σ [0,1] |
0 | Miss | S1R0 | Correct rejection | S0R0 |
μ [1,0] | μ [0,0] |
σ [1,0] | σ [0,0] |
The classification image is computed by
where
μ[·,·] is the average 21 × 11 noise matrix of a given stimulus and response category. Each element in the classification image
μ roughly represents the correlation of the noise at this spatial temporal location and the perception that the target is bright (it is slightly different from the standard definition of correlation in that the four categories are equally weighted regardless of the different number of samples in each group). Positive and negative correlations imply facilitation and inhibition, respectively.
The statistical significance of each correlation element is represented by the
z-score:
where
is the estimated correlation at a given spatial temporal location, i.e., an element of the matrix
μ; the standard deviation
σ of the correlation is estimated using Bootstrapping, i.e., from each of the four groups, the noise matrices are sampled with replacement, averaged, and then combined to obtain one sample of the classification image, and the standard deviations of the classification image are computed from 3,000 such samples.
The
z-scores above the Bonferroni corrected (
p-value < 0.05) threshold of statistical significance (i.e., ∣
z∣ > 3.7) are shown as red (positive) and blue (negative). This is a conservative estimate of significance (Chauvin, Worsley, Schyns, Arguin, & Gosselin,
2005). The False Discovery Rate method (FDR, Benjamini & Hochberg,
1995), which controls the expected proportion of true null hypotheses rejected out of the total number of null hypotheses rejected, instead of the traditional familywise error rate gave identical results.
Results
For the basic condition (Condition 1), the task performance for the normal and non-amblyopic eyes was above chance, whereas for about half of the amblyopic eyes it was near the chance level (50%). Increasing the target signal-to-noise ratios improved the performance in both the normal periphery on average and in the amblyopic fovea, so did decreasing the fundamental spatial frequency of the stimuli. For all but the first (basic) condition, the performance levels of all observers are comparable.
All normal observers show similar templates for detecting a luminance increment on a bright bar embedded in spatiotemporal noise, which is more clearly revealed by the average of all normal classification images (
Figure 2,
Table 4). Individual data are shown in the Supplementary material (
Supplementary Figure S1). First, a temporal integrative zone is present at the target location from nearly 40 ms before the target onset until 30–40 ms after the target offset. Around the same period of temporal summation, two spatial inhibition zones are symmetrically distributed 0.1–0.15 deg from the target. At the target location, almost immediately following the temporal summation is the temporal inhibition zone that peaks at about 40 ms and extends to about 100 ms after the target presentation. The spatial correlation profile at the time of the target presentation (
Figure 2, lower right graph) resembles a “Mexican hat” shape typical for a neural receptive field at various stages of visual processing (e.g., retinal ganglion cells, LGN, and V1 neurons). The temporal correlation profile at the target location (
Figure 2, upper right graph) is consistent with the paired-pulse or flicker detection time course reported by others (Kelly,
1971a,
1971b; Rashbass,
1970; Watson & Nachmias,
1977). These results suggest that the summation-only temporal profile in the previous study may have been affected by the high temporal uncertainty (Neri & Heeger,
2002).
Another observation of
Figure 2 is that the facilitative (red) and suppressive (blue) locations, including those with smaller absolute correlation values, in the classification image show an alternating pattern that is unlikely to be random, which suggests that part of the spatiotemporal template may be a product of the independent temporal and spatial functions, although obviously this is not true for the predominant mechanism or template.
On average, the classification images of the non-amblyopic eyes are normal or nearly so; however, the classification images of amblyopic eyes show a striking absence of inhibition (
Table 4). The averaged classification images are shown in
Figure 3. (The individual classification images are shown in
Supplementary Figure S2.) Both the temporal inhibition that normally peaks about 40 ms after the target onset and the spatial inhibition adjacent to the target location are absent for all five amblyopic eyes. On the other hand, despite the reduction in inhibition, the temporal summation for amblyopic eyes does not seem to be much different from that of the normal.
In an attempt to understand what has caused the anomaly in the spatiotemporal template of the amblyopic eye, we presented modified stimuli to the normal fovea and hoped to reproduce the amblyopic template. First, the stimuli were blurred in space by a disk filter (see Methods section). The blurred stimuli had little effect on task performance (
Table 3); neither did they result in a template that mimicked the amblyopic eye (
Figure 4,
Table 4; the individual classification images are shown in
Supplementary Figure S3). All components, especially the inhibition zones, are present in the classification image, which distinguishes it from that of the amblyopic eye. Compared to those of the control condition (
Figure 4, black dashed curves), classification images for blurred stimuli are scaled with larger spatial summation zones (
Figure 4, red solid curves). This is presumably due to positive correlations between spatially adjacent pixels that result from the filtering process. Clearly, it is not simply the image blur that caused the anomaly in the amblyopic spatiotemporal template.
Table 3 Discrimination task performance (percentage of correct responses) in all but the second conditions for the tested eyes of normal observers and both amblyopic (AE) and non-amblyopic (NAE) eyes of amblyopic observers. Condition 2 is excluded because the target position is filled by noise only. The asterisk denotes 2.5 deg in the parafovea.
Table 3 Discrimination task performance (percentage of correct responses) in all but the second conditions for the tested eyes of normal observers and both amblyopic (AE) and non-amblyopic (NAE) eyes of amblyopic observers. Condition 2 is excluded because the target position is filled by noise only. The asterisk denotes 2.5 deg in the parafovea.
Condition | Eye | 1 | 3 | 4 | 5 | 6 |
---|
|
SS | 68.3% | 70.4% | – | 73.7%* | 75.2%* |
SK | 66.8% | 69.4% | 58.6% | 68.1%* | 73.5%* |
MN | 73.6% | – | – | 66.9%* | – |
EP | 69.2% | – | – | – | – |
CN | 70.2% | – | – | – | – |
BK | NAE | 63.0% | – | – | – | – |
AE | 63.1% | – | – | 68.9% | 71.1% |
GW | NAE | 64.8% | – | – | – | – |
AE | 55.7% | – | – | 67.1% | 67.1% |
BN | NAE | 63.5% | – | – | – | – |
AE | 61.3% | – | – | – | – |
JS | NAE | 67.4% | – | – | – | – |
AE | 64.9% | – | – | – | – |
GJ | NAE | 63.0% | – | – | – | – |
AE | 53.2% | – | – | – | – |
GD | NAE | 65.8% | – | – | – | – |
AE | 58.3% | – | – | 77.1% | 75.5% |
Table 4 P-values for comparisons between experimental conditions and subjects. Statistically significant p-values are indicated by the bold and italic font. The regions of interest are labeled for each classification image: (1) space: the areas adjacent to the target in space on both sides (−60 to 60 ms, −0.5 to 0.5 deg, excluding the target location—the origin); (2) time: the frames following the target offset (0–200 ms, 0 deg, excluding the target location—the origin). The minimum correlation value in each region of interest represents spatial or temporal inhibition, which is used to compute the F-statistics for comparison between test conditions or subjects. This method eliminates possible individual differences in the shape of template. (a) Control conditions. (b) Stimulus manipulation in the normal eye. (c) Normal periphery. (d) Stimulus manipulation in the amblyopic eye.
Table 4 P-values for comparisons between experimental conditions and subjects. Statistically significant p-values are indicated by the bold and italic font. The regions of interest are labeled for each classification image: (1) space: the areas adjacent to the target in space on both sides (−60 to 60 ms, −0.5 to 0.5 deg, excluding the target location—the origin); (2) time: the frames following the target offset (0–200 ms, 0 deg, excluding the target location—the origin). The minimum correlation value in each region of interest represents spatial or temporal inhibition, which is used to compute the F-statistics for comparison between test conditions or subjects. This method eliminates possible individual differences in the shape of template. (a) Control conditions. (b) Stimulus manipulation in the normal eye. (c) Normal periphery. (d) Stimulus manipulation in the amblyopic eye.
| Spatial inhibition (p-value) | Temporal inhibition (p-value) |
---|
(a) |
Among normal eyes (foveal) | 0.2516 | 0.5654 |
Among NAEs | 0.0033 | 0.2475 |
Among AEs | 0.4054 | 0.3615 |
AE vs. NAE | 0.0062 | 0.0010 |
(b) |
Normal blurred vs. normal control | 0.0751 | 0.3615 |
Normal high noise vs. normal control | 0.0665 | 0.0016 |
(c) |
Normal 2.5 deg vs. AE | 0.3269 | 0.1257 |
Normal 2.5 deg low SF vs. normal 2.5 deg | 0.0725 | 0.9629 |
(d) |
AE high S/N ratio vs. AE control | 0.8851 | 0.3851 |
AE low SF vs. AE control | <0.0001 | 0.0023 |
Second, the target signal-to-noise ratios of the stimuli were decreased for normal observer SK. The rationale is that perhaps the amblyopic eye could not detect any target and the resulting template is not relevant to this task, since after all, their performance (correct response rate in
Table 3) is near the chance level (50%). Reducing target signal-to-noise ratios did make the task much more difficult and SK's correct response rate decreased to 58%. However, SK's classification image does not resemble that of the amblyopic eye (
Figure 5). Another piece of evidence that the target signal strength is not a significant factor in determining the template comes from the amblyopic eyes' null responses to stimuli with increased target signal-to-noise ratios (
Figure 6,
Table 4). Enhancing the visibility of the target did almost nothing for the amblyopic observers (
Supplementary Figure S4). On the other hand, we know that these amblyopic observers did detect the target from their substantially improved performance (
Table 3). Hence, the low visibility of the target is not likely to be responsible for the abnormal amblyopic template either.
Another possibility to which the abnormal amblyopic template may be attributed is that the spatiotemporal mechanisms in the normal and the amblyopic eyes may be shifted on the scale of spatial frequency (Levi, Waugh, & Beard,
1994). More specifically, the template may vary with spatial frequency of the stimuli, and the relationship between them may differ for the normal and the amblyopic eyes. Therefore, while 5 cycles/deg (i.e., the fundamental spatial frequency of the stimuli) probably elicits a template corresponding to mid-high spatial frequencies for the normal eye, it may elicit a template at the high end of the spectrum for the amblyopic eye. Presumably, the template used by the normal eyes is more efficient than that used by the amblyopic eye.
To test the hypothesis that amblyopic template is shifted in spatial scale, the fundamental spatial frequency of the stimuli was decreased from 5 cpd to 1.67 cpd by widening each bar in the stimuli from 0.1 deg to 0.3 deg and presented to two amblyopic eyes. Finally, normal classification images are seen in both amblyopic eyes in response to the modified stimuli (
Figure 7,
Table 4; individual classification images are shown in
Supplementary Figure S5). The substantial change in the amblyopic template cannot be attributed to the task performance, which, on average, is not significantly different from the previous condition (i.e., increased target signal-to-noise ratios). Interestingly, the spatial correlation profiles in
Figure 7 show that the spatial extents and shapes of the templates depend on the relative distance of the stimulus elements (i.e., distance normalized by bar width) instead of the absolute distance in degree. This result is not inconsistent with the amblyopic spatial classification image for detection reported by Levi et al. (
2008), which demonstrated a moderate shift of the amblyopic template toward lower spatial frequencies (i.e., wider and flatter spatial template). The target used in Levi and Klein's study was a fixed discrete frequency pattern (DFP) that had a negative–positive–negative luminance profile around the target, and they did not examine the effect of signal spatial frequency like the present study does by varying the stimulus spatial frequency. Our result suggests that the spatiotemporal mechanism of feature detection may depend on spatial scale.
As noted previously, we did not succeed in an attempt to produce the abnormal amblyopic template in normal foveal vision by either spatially blurring the stimuli or decreasing the target signal-to-noise ratios of the stimuli. Because of the close connection between amblyopic foveal vision and normal peripheral vision, we also estimated classification images from the periphery (2.5 deg eccentricity) of two normal observers using stimuli with increased target signal-to-noise ratios such that the performance levels were maintained above chance (
Table 3). As
Figure 8 and
Table 4 show, like those of the amblyopic eye, the inhibition is missing in the normal peripheral classification image (individual classification images are shown in
Supplementary Figure S6).
The subsequent question is whether a normal foveal template can be generated in the periphery by scaled stimuli. Similarly, the bar width was increased from 0.1 deg to 0.3 deg to decrease the fundamental spatial frequency of the stimuli from 5 cpd to 1.67 cpd, and the stimuli were presented to the periphery for two normal observers. The resulting classification images show partial restoration in spatial inhibition, but the difference is not statistically significant (
Figure 9,
Table 4; individual classification images are shown in
Supplementary Figure S7). Again, the spatial profile of the template depends on the relative distance but not the absolute distance in degrees. Further investigation is required to confirm the possible common spatial mechanisms in simple feature perception shared by normal peripheral vision and the amblyopic foveal vision.
Table 4 lists
p-values of various hypothesis tests of different experimental conditions or among subjects. Since the shape of the template can differ from person to person, any tests that require the absolute location information of the classification image would be inappropriate. Since we are mostly interested in the inhibition, we tested the spatial and temporal inhibitions separately. A minimum correlation value within a region of interest was selected to represent either spatial or temporal inhibition, and comparison between experimental conditions or subjects was conducted using these representative values (i.e., minimum correlations). The results show no statistically significant difference in either spatial or temporal inhibition among normal eyes (foveal viewing) or amblyopic eyes but some significant variance of the spatial inhibition among non-amblyopic eyes and significant difference of both spatial and temporal inhibitions between amblyopic and non-amblyopic eyes. These reasonable results demonstrate that our method of hypothesis testing is suited for the purpose of classification image comparison. In the amblyopic eye, increasing the S/N ratio of the stimuli did not improve the spatiotemporal inhibition, whereas lowering the spatial frequency of the stimuli significantly restored both spatial and temporal inhibitions. Blurring the stimuli had no effect on the normal inhibition, whereas increasing the noise luminance (i.e., reducing S/N ratio) significantly impaired the normal temporal inhibition but not the spatial inhibition. The spatiotemporal inhibition in the normal periphery is similar to that in the amblyopic eye's fovea. However, unlike in the amblyopic eye's fovea, lowering the spatial frequency of the stimuli failed to restore the normal foveal spatiotemporal inhibition in the normal periphery.
Experiment 2: Spatiotemporal interactions for orientation identification
Spatial interactions among cluttered visual inputs affect both sensitivity and perceptual bias (e.g., apparent tilt) simultaneously. Crowding is normally described as the decreased sensitivity to a target among nearby similar distracters (Levi,
2008; Pelli, Palomares, & Majaj,
2004). What is missing in this description is that the crowded perception is not necessarily featureless; rather, crowding is often associated with a perceptual bias determined by the cluttered features. Parkes, Lund, Angelucci, Solomon, and Morgan (
2001) reported that statistical attributes of a crowded feature (e.g., orientation) are spatially pooled instead of being lost and that the accessible average feature covertly serves as a perceptual pedestal (i.e., bias), which can either elevate or lower the feature detection threshold under certain conditions, although later studies argued that crowding cannot be reduced to mere feature averaging (Balas, Nakano, & Rosenholtz,
2009; Petrov & Popple,
2007).
In peripheral vision, flankers can produce very striking perceptual biases or illusions. Thus, the perceived tilt of a line may be either undergo assimilation, where it appears to be attracted toward slightly differently tilted objects, or repulsion, the exaggeration of the difference between the target and flankers when their orientations are grossly different (Solomon, Felisberti, & Morgan,
2004). These studies demonstrate the importance of not only the sensitivity loss but also the possible perceptual changes in understanding spatial interactions. While the spatial properties of crowding have been widely studied, the temporal properties of the spatial interactions have rarely been investigated.
Westheimer and Hauske (
1975) reported that flankers with positive stimulus onset asynchronies (SOAs; i.e., flanker onset after target onset) have stronger interfering effects than those of simultaneously presented flankers on foveal Vernier acuity. Similar results have been found by others (Huckauf & Heller,
2004). Abnormal spatial interactions (crowding) are a well-known feature of strabismic amblyopia. In the present study, we systematically investigated the spatiotemporal interactions between flankers and the target, specifically on feature (i.e., orientation) perception, using the method of classification images.
Methods
Observers
Two observers with normal vision and two with strabismic amblyopia participated. All participants had substantial experience with visual psychophysical experiments and were naïve as to the purpose of the study except the author. None of the amblyopic observers has eccentric fixation. The detailed characteristics of each observer are listed in
Table 5.
Table 5 Observers' characteristics.
Table 5 Observers' characteristics.
Observer | Age (years) | Strabismus (at 6 m) | Eye | Refractive error (diopters, D) | Line letter VA (single letter VA) |
Normal |
SS | 29 | None | R | −0.25 | 20/12.5 |
| | L | pl/−0.25 × 119 | 20/12.5 |
LL | 26 | None | R | −1.50 | 20/12.5 |
| | L | −0.50 | 20/12.5 |
Strabismic amblyopic |
GJ | 24 | R EsoT 4–5Δ | R (AE) | +3.50/−1.00 × 97 | 20/40−2 (20/25+1) |
| | L (NAE) | pl | 20/16−1 |
AP | 22 | L EsoT 4–5Δ and HyperT 2–3Δ | R (NAE) | −1.25/−0.50 × 175 | 20/16−2 |
| | L (AE) | −0.50/−0.25 × 60 | 20/50+1 (20/40+2) |
Stimuli
Stimuli were generated by Matlab 7.01 with Psychtoolbox 2.0 and presented on a flat CRT monitor (30.1 × 40.1 cm2, 1600 × 1400 pixels, 100 Hz, Sony). At 41% brightness and 98% contrast, the luminance gradient ranges from 3.8 to 133.2. The monitor was calibrated to generate luminance that is a linear function of pixel value.
The background luminance of the stimuli was 28.4 cd/m2. A stimulus sequence for each trial was composed of 9 frames. For normal observers, in every frame, there were either 11 or 9 possible positions evenly distributed along a horizontal line; each of the 5 or 4 positions on one side was independently filled with a high contrast (90%) Gabor patch with a probability of 0.5 and these Gabor patches were positioned symmetrically on both sides; therefore, the significant number of positions (excluding the target position) was 5 or 4 instead of 10 or 8 due to symmetry. Here, we assumed symmetry in the noise effect on perception because the distance from each stimulus patch to the fovea is presumably the same for both normal (in the lower visual field) and amblyopic (at the fixation) eyes. The middle (6th or 5th) position was occupied by the target Gabor patch in the middle (5th) frame and blank in all other frames. For amblyopic observers, there were only 4 possible positions on each side of the target. Each Gabor patch was tilted by a small angle either clockwise (i.e., left end higher, α) or counterclockwise (i.e., right end higher, π − α) and the magnitude of the tilt (around 6 deg) was determined for each individual (discussed in the Procedures section). Again, the tilt of the Gabor patch was symmetric around the target position, i.e., if the Gabor patch at position 1 was tilted 6 deg clockwise, then the one at position 11 was also tilted 6 deg clockwise.
The target in the middle position of the middle frame was cued in time and space. A short red line segment below the target, a red solid circle around the target, or a red dotted circle around the target was used as the cue that appeared at the onset of the target and disappeared at the offset of the target; hence, the cue was synchronized with the target in time. Since there was no significant difference between the results of the solid circle cue and those of the dotted circle cue, they are combined as one group in the analysis. The duration of each frame was 40 ms, and the entire sequence was 0.36 s (
Figure 10).
A Gabor patch is mathematically defined as a two-dimensional sinusoidal wave multiplied by a Gaussian envelop:
where
α is the tilted angle of the Gabor patch,
ν is the frequency of the sinusoid,
σ is the standard deviation of either dimension of the Gaussian function, and
λ is a scaling constant. Here,
λ is selected such that
G′(·,·) is within the interval [−1, 1], with the minimum and maximum being −1 and 1, respectively.
The contrast of a Gabor patch is defined as the Weber contrast:
Therefore, the Gabor patch with contrast
C is
where
C ∈ [0, 1].
Procedures
All experiments were done monocularly in a room with dim overhead light (10 cd/m2). The stimuli were presented at fixation for amblyopic observers or at either 4 deg or 5 deg in the inferior visual field of normal observers. A fixation cross (20 × 20 min) was presented to normal observers. The distance from the stimuli and the tested eye was 100 cm for normal observers or 300 cm for amblyopic observers.
Preliminary experiments. For each observer, the spatial frequency of the Gabor patches was chosen to be highly visible, i.e., half of the wavelength of a Gabor patch is between 1.5 and 2 multiples of the minimum angle of resolution (MAR) estimated by a letter chart. The Gaussian envelope standard deviation was 0.09–0.16 deg, resulting in 2–3 visible cycles in a Gabor patch. The tilt angle was set at the detection threshold absent of noise. In a 40-trial preliminary session, the stimulus, a single Gabor patch in the target position and no distracter, was tilted by a small angle (clockwise or counterclockwise), and the observer responded whether the target was tilted or not (i.e., yes/no). The tilt direction (i.e., clockwise or counterclockwise) was randomly selected for each trial, and the angle was determined by Quest with the criterion of 75%. The threshold angle
α was taken as the amount of tilt of the Gabor patches for the observer.
Table 6 summarizes the stimulus characteristics for each observer.
Table 6 Stimulus (Gabor patch) characteristics for each observer. The separation is defined to be center-to-center distance between neighboring Gabor patches.
Table 6 Stimulus (Gabor patch) characteristics for each observer. The separation is defined to be center-to-center distance between neighboring Gabor patches.
| Eccentricity (deg) | Wavelength λ (deg) | Standard deviation σ (deg) | Separation (deg) | Tilt α (deg) |
Normal observers |
SS | 5 | 0.213 | 0.158 | 0.67 | 4.0 |
LL | 4 | 0.200 | 0.120 | 0.51 | 4.0 |
Amblyopic observers |
AP | 0 | 0.118 | 0.087 | 0.38 | 6.2 |
GJ | 0 | 0.111 | 0.087 | 0.38 | 6.3 |
Tasks. In each trial, a complete stimulus sequence was presented, and the observer had to answer whether the target Gabor patch was tilted clockwise or counterclockwise by pressing a key. No feedback was given. After every five trials, there was a training trial in which only the target and the cue but no distracter was presented to the fovea of the observer, and the response to the training trial was not counted. Each block had 30 trials (not including training trials), and the number of blocks in each session was between 2 and 4. A total of 750–1700 trials were collected for each condition for each eye.
Data analyses
The reverse correlation method described in
Experiment 1 was used to compute the classification images for the current experiment. Using the notation from
Table 2, we can code “clockwise tilt” as 1 and “anti-clockwise tilt” as 0, and apply them to both the stimuli and the responses. The noise matrix is 9 (time) × 5 (space) or 9 × 4 for normal observers and 9 × 4 for amblyopic observers. The spatial positions are labeled mono-sided 1 to 5 from near to far relative to the target, due to the symmetry of distracters around the target. The noise at a given time and space has the following distribution:
Results
Without distracters, the correct response rate is around 75% for each observer. With closely placed distracters, the performance is much worse for both normal peripheral and amblyopic central vision (
Table 7), suggesting a crowding effect.
Table 7 Discrimination task (2AFC) performance (percentage of correct responses) for the tested eyes of normal observers and the amblyopic eyes (AE) of amblyopic observers.
Table 7 Discrimination task (2AFC) performance (percentage of correct responses) for the tested eyes of normal observers and the amblyopic eyes (AE) of amblyopic observers.
Observer | SS | LL | AP (AE) | GJ (AE) |
Circle cue | 59.6% | 70.4% | 58.9% | 61.4% |
Line cue | 55.4% | 69.4% | – | 50.7% |
The main significant interaction between the distracters and the target is orientation assimilation (red areas), i.e., an observer is likely to perceive the target orientation as similar to that of the flankers, consistent with previous studies of normal peripheral vision (Parkes et al.,
2001; Solomon et al.,
2004). This assimilation effect is strongest at locations immediately adjacent to the target for every observer, as shown by the narrow vertical red zones on both sides of the target in
Figure 11. The assimilation effect extends further in space for flankers presented simultaneously with the target and is limited in space for flankers with non-zero asynchronies. Interestingly, at locations beyond the assimilation area, the target–flanker interaction becomes repulsion (blue), i.e., an observer is likely to perceive the orientation of the target as opposite to that of the flankers. The repulsion effect in orientation perception as a function in space was recently reported in peripheral vision by Mareschal, Morgan, and Solomon (
2010). We refer to it as “anti-crowding”; along with assimilation, the spatial interaction shows a “Mexican hat” profile (discussed below).
It seems unexpected that the assimilation zone for flankers with non-zero asynchronies is so narrow that the adjacent repulsion zone is still within the critical distance for crowding (i.e., 0.5E = 2.5 deg at 5 deg eccentricity or 2 deg at 4 deg). However, for flankers presented non-simultaneously with the target, the crowding effect on the spatially closest flankers was substantially relieved since the target position was not occupied. “Anti-crowding” occurs immediately adjacent to assimilation in space, as shown by the blue areas in the z-score image and the dark gray squares of the classification images, although not all meet the statistically significant criterion (i.e., ∣z∣ > 3).
Another seemingly surprising result is that the time when the target is on (i.e., time 0) does not coincide with the peak of the assimilation in any condition. This may be similarly explained by the flankers being subject to more crowding while the target was on (i.e., the target acts to crowd the flankers). This is partly consistent with the results of earlier studies (Huckauf & Heller,
2004; Westheimer & Hauske,
1975).
The shape of the crowding map depends on the cue type. The circle cue induces a bimodal profile in time that demonstrates two crowding peaks about 80 ms before and 40 ms after the target presentation. The line cue, on the other hand, induces predominant crowding before and around the target presentation and is strongest near the beginning of the sequence. The observers' performance associated with the circle cue is moderately better than that associated with the line cue (
Table 7). However, every observer expressed that the task was difficult and the guessing rate was high.
The amblyopic spatiotemporal interaction maps also demonstrate strongest orientation assimilation at the location adjacent to the target, with anti-crowding present at farther locations. The greatest extent of orientation assimilation occurs simultaneously with or immediately before the target presentation for normal eyes (
Figure 11; positive correlations are represented by light gray squares), whereas it occurs after the target presentation (i.e., while the target is off) for the amblyopic eyes (
Figure 12). “Anti-crowding” seems to be weakened as the eccentricity increases in the normal eye as well as in the amblyopic fovea. However, in order to know whether or not these differences are significant or meaningful, further investigation is necessary. In general, the amblyopic crowding map is not substantially different from that of the normal periphery.
Discussion
Temporal mechanisms for simple feature discrimination
Using a discrimination task and the method of classification images, we were able to estimate the spatiotemporal templates for luminance discrimination of both normal and amblyopic observers under various conditions. Our results for the normal template differ from what Neri and Heeger (
2002) reported in that we showed a temporal inhibition component that peaks around 40–60 ms after the target presentation. As explained previously, we think this difference is due to the elimination of the temporal uncertainty using a discrimination task.
Our results also differ from the classic paired-pulse, three-pulse, or flicker sensitivity time course in that the latter is symmetric around the target presentation (time 0), i.e., there are two temporal inhibition lobes before and after the target presentation (Bergen & Wilson,
1985; Rashbass,
1970; Watson & Nachmias,
1977). Note that in the three-pulse study by Bergen and Wilson the 1st and 3rd pulses always had the same luminance. Although we are not certain as to the reason for such differences, we think they are probably related to two factors. First, the stimulus target in the present study has suprathreshold luminance, whereas the classic paired-pulse experiments were designed to test the luminance threshold. It has been suggested that the mechanisms involved in suprathreshold and near-threshold stimuli may be very different (Manny & Levi,
1982a,
1982b). Second, the presence of spatiotemporal noise that shares similar features with the target in our study could result in unknown interactions that can affect the perception, especially since the classic paired-pulse experiments did not have a spatial variable at all. Further investigation is necessary to understand the mechanism underlying these differences.
The classic paired-pulse temporal sensitivity time course has been fairly well modeled by a two-stage process by Rashbass (
1970). The first (linear) stage is temporal filtering. A working model of the temporal impulse response function was proposed and developed by numerous people (Watson,
1986). The modeled function is composed of a transient and a sustained component. The transformed function (i.e., filter output of the stimulus input) is then processed at the second stage where the squared
2-norm (i.e., the sum of the squares of the filter output signal) of the function is computed, which serves as the basis for the perceptual decision. This model correctly predicts that the paired-pulse sensitivity time course is determined by the autocorrelation function of the temporal impulse function that is always symmetric about the zero asynchrony (i.e., the target presentation time). As mentioned above, our result is not symmetric about the target presentation since the inhibition component is only present after the target offset, and therefore, it cannot be accounted for by Rashbass' model. Neither can it be accounted for by a simpler model in which the second stage (i.e., summation of squares) is replaced by simply taking the maximum of the filtered signal to be the base for decision criterion, which predicts a horizontally flipped profile (
Figure 13).
On the other hand, the asymmetric U shape of the temporal profile of the normal template for discriminating a bright bar, whose inhibition component peaks at 40–60 ms and extends to about 100 ms after the target presentation, somewhat resembles the time course of the metacontrast or structure backward masking function that typically has a U shape and the masking effect is strongest when the mask onset is 60–100 ms after the target onset (Breitmeyer & Ogmen,
2000). What distinguishes these two visual functions is that there is no summation in backward masking, i.e., the masking effect always interferes with the target detection. This difference could be due to two facts: first, the task in the present study is to discriminate a target at a suprathreshold level, whereas it is often detection of the stimulus near its threshold level in studies on backward masking; second, in our study the target and the distracter that may be considered as the “mask” are overlapping in space and uniform in structure (in terms of a single bar), whereas in metacontrast or structure backward masking, the mask does not overlap with the target or has structures different from that of the target. However, the second reason stated above might not be characteristic for backward masking, since in a recent study by Saarela and Herzog, the iso-oriented mask that spatially overlaps the target (Gabor patch) had strongest contrast masking effects at its offset and onset with no facilitation for any SOA. At the peaks of the masking effect, the detection threshold of the target even exceeds the mask contrast (Saarela & Herzog,
2008). One possible mechanism of backward masking is that the propagation of the target elicited neural activity is interrupted by the mask presentation. Along the same line, it is reasonable to hypothesize that the discrimination of a bright bar relies on the proper pattern of neural activity, instead of the maximum activity or its “energy” that can be defined to be the summed squares as in Rashbass' model. A bright stimulus initiates a train of neural spikes whose rates form a pattern along time, which has been demonstrated to be similar to that of the temporal impulse function (Albrecht, Geisler, Frazor, & Crane,
2002). Neurons that receive these activities in the next stage may recognize specific patterns of the activities and respond according to the degree of matching. In the case of the present study, the temporal template profile has the typical shape of a temporal impulse response; therefore, the neural activities elicited by an optimal stimulus (predicted by the template) will have a pattern that mimics the shape of the temporal impulse response, given that the stimuli have suprathreshold visibility. This pattern is likely to be recognized by the neurons of the next stage as coming from a strong visual stimulus. In addition, the polarity of the target determines that the profile is asymmetric.
The temporal impulse function has been modeled to have a sustained and a transient component (Watson,
1986). A pure-sustained temporal impulse function is non-negative, whereas a pure-transient temporal impulse function has zero area (i.e., positive areas = negative area). Likewise, anything in between has both sustained and transient components. It has been found that stimuli of low spatial frequency are associated with transient temporal response and the transience weighs less or even disappears for stimuli of high spatial frequency. In the present study, the control stimulus had a fundamental spatial frequency of 5 cpd, which was decreased to 1.67 cpd for the normal peripheral and amblyopic central vision testing. At the relatively high spatial frequency, the amblyopic template shows a lack of inhibition both spatially and temporally, with the temporal summation hardly affected. Interestingly, the amblyopic template was completely restored to normal using stimuli with low spatial frequency. This is consistent with previous studies that also found that at low spatial frequency the difference in temporal sensitivity in both amblyopic and normal eyes was substantially decreased or even became unnoticeable (Bradley & Freeman,
1985; Manny & Levi,
1982a,
1982b). These results suggest that the temporal template profile as a function of spatial frequency is shifted toward the lower spatial frequency for amblyopic vision. When the stimuli were presented in the normal periphery, a template similar to that of the amblyopic eye was reproduced. Decreasing the fundamental frequency of the stimuli partly restored the spatial but not temporal inhibition mechanism. Therefore, we hypothesize that the spatial properties of normal peripheral vision are similar to that of amblyopic centric vision, but their temporal properties may be quite different.
Both normal peripheral vision and amblyopic central vision are crowded in space and time. In contrast to the well-studied spatial characteristics of crowding, the temporal properties of crowding have rarely been investigated. Note that the temporal properties of crowding are not the same as “temporal crowding,” which often refers to the analogue of crowding in time, i.e., the target visibility is impaired by nearby temporal flankers that are presented at the target location (Pelli et al.,
2004). The most relevant study on the spatiotemporal properties of crowding was done by Westheimer and Hauske (
1975). They investigated the crowding effect on foveal Vernier acuity by varying the target–flanker separation as well as asynchrony. With various target–flanker spatial configurations and fixed target presentation duration (50 ms), the crowding effect appeared at zero asynchrony and increased with target–flanker asynchrony up to 150 ms (i.e., backward crowding). In a recent preliminary study on letter crowding by Chung, a similar time course of crowding was found (Chung, personal communication). We also showed that the strongest crowding effect did not occur during the target presentation in the present study. However, both backward crowding generated by post-target flankers and forward crowding generated by pre-target flankers exist in our results, especially in the normal periphery. It is not clear as to what caused this difference, but it may be related to the fact that the target was embedded in a relatively long sequence composed of 9 frames lasting 0.36 s in the present study, whereas the stimulus sequence in the previous studies had only two frames, one of which contained the target. The temporal order of the target and the flankers was unlikely to be mistaken by an observer for the 2-frame stimulus sequence but could be easily be confused for the 9-frame stimulus sequence. Further investigation is required to test this hypothesis.
Spatial mechanisms for simple feature discrimination
The results from
Experiment 2 illustrate the effect of spatial interactions on target orientation perception as a function of space and time. The orientation assimilation effect (i.e., positive correlation) is distributed close to the target location within the critical space for the tested eccentricities, therefore representing crowding (Parkes et al.,
2001; Pelli et al.,
2004). This orientation assimilation effect of crowding is consistent with previous reports (Parkes et al.,
2001; Solomon et al.,
2004). Mounting evidence suggests that crowding emerges from feature integration over an undesirably large area following feature detection (Levi,
2008; Pelli et al.,
2004), and that crowding is like texture perception (Balas et al.,
2009; Levi & Carney,
2009; Parkes et al.,
2001). Several explicit mechanisms have been proposed to explain crowding. Averaging features from the clutter explains certain orientation configurations but not others, suggesting averaging may be a special case of more general pooling mechanisms (Parkes et al.,
2001). Solomon et al. developed a unified model that accounted for both the sensitivity and the perceptual bias for crowding (assimilation) and the tilt illusion (repulsion), which incorporates a local modulated response followed by an opponent mechanism for orientation identification (Solomon et al.,
2004). A computational model that computes summary statistics over the crowding zone based on texture synthesis successfully predicted human performance under crowding condition for a variety of tasks (Balas et al.,
2009). Since the flanking effect on target orientation threshold was not the focus of the present study, the estimated spatiotemporal map for perceiving a crowded orientation is not intended to be a complete picture of crowding. Amblyopic spatiotemporal interaction maps differ from those of the normal periphery in some aspects such as the relative weights of forward and backward crowding, although the difference is small and will need more evidence to confirm.
An important finding of
Experiment 2 is “anti-crowding” (i.e., repulsion) at locations farther away from the target, demonstrating an attraction–repulsion structure similar to the facilitation–inhibition spatial template for discriminating a luminance increment of a bright bar in
Experiment 1 and those for other visual tasks. In physiology, retinal ganglion cells and LGN neurons both have the radial symmetric on/off receptive field with a center–surround profile; so do some V1 simple cells in selective orientations. This center–surround organization for orientation perception was recently reported by Mareschal et al. (
2010), who showed that in normal periphery, cortical distance determines whether flankers cause crowding (assimilation) or repulsion (the tilt illusion).
The center–surround spatial template was also found in
Experiment 2 for orientation discrimination, which extends as far as 2.8 deg on each side of the target at 4 deg eccentricity. It seems that the center–surround structure, with the center effect much stronger than the surrounding subtractive effect as shown by the difference of their relative weights, is quite general for visual perceptual mechanisms associated with both lower and higher level visual processes (Badcock & Westheimer,
1985; Kienzle, Franz, Scholkopf, & Wichmann,
2009; Levi & Klein,
2003; Wilson,
1978). Moreover, since the significance of such a spatial profile with both positive and negative components is to detect the change in an image, e.g., a bright light spot in the darkness or a shift in position or phase in an otherwise uniform texture, it is reasonable to hypothesize that crowding, which represents averaging or pooling in some sense, occurs in the centroid in the feature identification process. The general center–surround mechanism is obviously limited by a certain spacing constraint (e.g., either effect disappears at a large enough spacing) but does not have a one-to-one relationship with the absolute distance; rather, it seems to be a function of the spatially ordered image feature. More specifically, when a feature is not crowded either because it is relatively isolated in space or very different from the surrounding features, it can occupy one of the off–on–off locations; whereas when some features are crowded, the pooled features as an ensemble are processed as one unit (on/off) for differentiation. This pooling–differentiation visual information processing probably exists at multiple levels.
Cueing a crowded target has no effect on identification sensitivity (Felisbert, Solomon, & Morgan,
2005). Similarly, we found that the ratio of correct responses with distracters was much lower than that without distracters even with spatiotemporal cueing in our experiment. However, our results show that a cue does affect perception under spatiotemporal interactions.
Summary and conclusion
The normal template for luminance discrimination has an inhibition–summation–inhibition structure in space and a summation–inhibition time course. The amblyopic spatiotemporal mechanism lacks inhibition, especially the temporal inhibition. The absence of temporal inhibition for moderate spatial frequency stimuli may provide a low-level explanation for the abnormal “attentional blink” in amblyopia (Popple & Levi,
2008). Neither increasing the signal-to-noise ratio of the stimuli for the amblyopic eye or decreasing the signal-to-noise ratio of the stimuli for the normal eye or blurring the stimuli spatiotemporally caused any significant change in the template. However, by decreasing the fundamental frequency of the stimuli the normal inhibition was successfully restored in the amblyopic eye, suggesting that the amblyopic mechanism for luminance discrimination is shifted toward lower spatial frequencies.
The normal peripheral spatiotemporal map for orientation perception is characterized by strong perceptual assimilation (i.e., crowding) near the target and relatively weak perceptual repulsion (i.e., “anti-crowding”) farther from the target. Simultaneous, forward, and backward assimilation effects (i.e., crowding) are all present but affected by different cues, and interestingly, the simultaneous assimilation effect is relatively less strong. We noted that the center–surround organization here may be a general spatial template for both lower level and higher level visual processing for a wide variety of visual tasks. The amblyopic spatiotemporal interaction map for orientation perception is similar to that of the normal periphery, except that in general the spatiotemporal interaction is more widely distributed around and after the target presentation, and the “anti-crowding” seems to be weakened.
Supplementary Materials
Acknowledgments
We thank Susana Chung for her comments. This research was supported by National Institutes of Health Grant RO1EY-1728.
Commercial relationships: none.
Corresponding author: Shuang Song.
Email: songsh@berkeley.edu.
Address: Vision Science Program, University of California, Berkeley, CA, USA.
References
Ahumada A. J., Jr.
(1996). Perceptual classification images from Vernier acuity masked by noise. Perception, 26, 1831–1840.
Ahumada, Jr. A. J.
Lovell J.
(1971). Stimulus features in signal detection.
Journal of the Acoustical Society of America, 49, 1751–1756.
[CrossRef]
Albrecht D. G.
Geisler W. S.
Frazor R. A.
Crane A. M.
(2002). Visual cortex neurons of monkeys and cats: Temporal dynamics of the contrast response function.
Journal of Neurophysiology, 88, 888–913.
[PubMed]
Badcock D. R.
Westheimer G.
(1985). Spatial location and hyperacuity: The centre/surround localization contribution function has two substrates.
Vision Research, 25, 1259.
[CrossRef] [PubMed]
Beard B. L.
Ahumada, Jr. A. J.
(1998). Technique to extract relevant image features for visual tasks. Human Vision and Electronic Imaging III, Proceedings of SPIE, 3299, 79–85.
Bedell H. E.
Flom M. C.
(1981). Monocular spatial distortion in strabismic amblyopia.
Investigative Ophthalmology & Visual Science, 20, 263–268.
[PubMed]
Benjamini Y.
Hochberg Y.
(1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, 57, 289–300.
Bergen J. R.
Wilson H. R.
(1985). Prediction of flicker sensitivities from temporal three-pulse data.
Vision Research, 25, 577–582.
[CrossRef] [PubMed]
Bradley A.
Freeman R. D.
(1985). Temporal sensitivity in amblyopia: An explanation of conflicting reports.
Vision Research, 25, 39–46.
[CrossRef] [PubMed]
Breitmeyer B. G.
Ogmen H.
(2000). Recent models and findings in visual backward masking: A comparison, review, and update.
Perception & Psychophysics, 62, 1572–1595.
[CrossRef] [PubMed]
Bussgang J. J.
(1952). Cross-correlation functions of amplitude-distorted Gaussian signals. MIT Research Laboratory of Electronics Technical Report, 216, 1–14.
Ciuffreda K. J.
Levi D. M.
Selenow A.
(1991). Amblyopia: Basic and clinical aspects. Boston: Butterworth-Heinemann.
DeAngelis G. C.
Ohzawa I.
Freeman R. D.
(1993). Spatiotemporal organization of simple-cell receptive fields in the cat's striate cortex I General characteristics and postnatal development.
Journal of Neurophysiology, 69, 1091–1117.
[PubMed]
de Boer E.
(1967). Correlation studies applied to the frequency resolution of the cochlea. Journal of Audio Research, 7, 209–217.
de Boer R.
Kuyper P.
(1968). Triggered correlation.
IEEE Transactions on Biomedical Engineering, 15, 169–179.
[CrossRef] [PubMed]
Felisbert F. M.
Solomon J. A.
Morgan M. J.
(2005). The role of target salience in crowding.
Perception, 34, 823–833.
[CrossRef] [PubMed]
Gold J. M.
Murray R. F.
Bennett P. J.
Sekuler A. B.
(2000). Deriving behavioural receptive fields for visually completed contours.
Current Biology, 10, 663–666.
[CrossRef] [PubMed]
Hess R. F.
Jacobs R. J.
(1979). A preliminary report of acuity and contour interactions across the amblyope's visual field.
Vision Research, 19, 1403–1408.
[CrossRef] [PubMed]
Hubel D. H.
Wiesel T. N.
(1959). Receptive fields of single neurones in the cat's striate cortex.
The Journal of Physiology, 148, 574–591.
[CrossRef] [PubMed]
Huckauf A.
Heller D.
(2004). On the relations between crowding and visual masking.
Perception & Psychophysics, 66, 584–595.
[CrossRef] [PubMed]
Jones J. P.
Palmer L. A.
(1987). The two-dimensional spatial structure of simple receptive fields in cat striate cortex.
Journal of Neurophysiology, 58, 1187–1211.
[PubMed]
Kelly D. H.
(1971a). Theory of flicker and transient responses I Uniform fields. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 61, 537–546.
Kelly D. H.
(1971b). Theory of flicker and transient responses II Counterphase gratings. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 61, 632–640.
Levi D. M.
(1991). Spatial vision in amblyopia. In
Regan D.
(Ed.), Spatial vision (pp. 212–238). London: Macmillan Press.
Levi D. M.
(2008). Crowding—An essential bottleneck for object recognition: A mini-review.
Vision Research, 48, 635–654.
[CrossRef] [PubMed]
Levi D. M.
Carney T.
(2009). Crowding in peripheral vision: Why bigger is better.
Current Biology, 19, 1988–1993.
[CrossRef] [PubMed]
Levi D. M.
Harwerth R. S.
(1977). Spatio-temporal interactions in anisometropic and strabismic amblyopia.
Investigative Ophthalmology & Visual Science, 16, 90–95.
[PubMed]
Levi D. M.
Klein S. A.
(1985). Vernier acuity, crowding and amblyopia.
Vision Research, 25, 979–991.
[CrossRef] [PubMed]
Levi D. M.
Klein S. A.
(2003). Noise provides some new signals about the spatial vision of amblyopes.
Journal of Neuroscience, 23, 2522–2526.
[PubMed]
Levi D. M.
Klein S. A.
Aitsebaomo A. P.
(1985). Vernier acuity, crowding and cortical magnification.
Vision Research, 25, 963–977.
[CrossRef] [PubMed]
Levi D. M.
Waugh S. J.
Beard B. L.
(1994). Spatial scale shifts in amblyopia.
Vision Research, 34, 3315–3333.
[CrossRef] [PubMed]
Levi D. M.
Yu C.
Kuai S. G.
Rislove E.
(2007). Global contour processing in amblyopia.
Vision Research, 47, 512–524.
[CrossRef] [PubMed]
Li R. W.
Klein S. A.
Levi D. M.
(2008). Prolonged perceptual learning of positional acuity in adult amblyopia: Perceptual template retuning dynamics.
Journal of Neuroscience, 28, 14223–14229.
[CrossRef] [PubMed]
Manny R. E.
Levi D. M.
(1982a). Psychophysical investigations of the temporal modulation sensitivity function in amblyopia: Spatiotemporal interactions. Investigative Ophthalmology & Visual Science, 22, 525–534.
Manny R. E.
Levi D. M.
(1982b). Psychophysical investigations of the temporal modulation sensitivity function in amblyopia: Uniform field flicker. Investigative Ophthalmology & Visual Science, 22, 515–524.
Neri P.
Heeger D. J.
(2002). Spatiotemporal mechanisms for detecting and identifying image features in human vision.
Nature Neuroscience, 5, 812–816.
[PubMed]
Neri P.
Levi D. M.
(2006). Receptive versus perceptive fields from the reverse-correlation viewpoint.
Vision Research, 46, 2465–2474.
[CrossRef] [PubMed]
Neri P.
Parker A. J.
Blakemore C.
(1999). Probing the human stereoscopic system with reverse correlation.
Nature, 401, 695–698.
[CrossRef] [PubMed]
Ohzawa I.
DeAngelis G. C.
Freeman R. D.
(1997). Encoding of binocular disparity by complex cells in the cat's visual cortex.
Journal of Neurophysiology, 77, 2879–2909.
[PubMed]
Parkes L.
Lund J.
Angelucci A.
Solomon J. A.
Morgan M.
(2001). Compulsory averaging of crowded orientation signals in human vision.
Nature Neuroscience, 4, 739–744.
[CrossRef] [PubMed]
Rashbass C.
(1970). The visibility of transient changes of luminance.
The Journal of Physiology, 210, 165–186.
[CrossRef] [PubMed]
Ringach D. L.
Hawken M. J.
Shapley R.
(1997). Dynamics of orientation tuning in macaque primary visual cortex.
Nature, 387, 281–284.
[CrossRef] [PubMed]
Ringach D. L.
Shapley R.
(2004). Reverse correlation in neurophysiology.
Cognitive Science, 28, 147–166.
[CrossRef]
Watson A. B.
(1986). Temporal sensitivity. In Handbook of perception and human performance. (vol. 1, pp. 1–43).
Watson A. B.
Nachmias J.
(1977). Patterns of temporal interaction in the detection of gratings.
Vision Research, 17, 893–902.
[CrossRef] [PubMed]
Watson A. B.
Rosenholtz R.
(1997). A Rorschach test for visual classification strategies. Investigative Ophthalmology & Visual Science, 38, S1.
Westheimer G.
Hauske G.
(1975). Temporal and spatial interference with Vernier acuity.
Vision Research, 15, 1137–1141.
[CrossRef] [PubMed]
Wilson H. R.
(1978). Quantitative characterization of two types of line-spread function near the fovea.
Vision Research, 18, 971–981.
[CrossRef] [PubMed]