To compute depth from binocular disparity, the visual system must correctly link corresponding points between two images, given multiple possible correspondences. Typically, model solutions to this problem use some form of local spatial smoothing, with many physiologically inspired models doing so implicitly, through the use of local cross-correlation-like procedures. In this paper we show that implicit smoothing, without the explicit consideration of relative disparity, cannot account for biases in the perception of a novel ambiguous stereo stimulus. Observers viewed a stereogram consisting of multiple strips of periodic random-dot patterns, perceived as either a slanted surface, or a triangular wedge in depth, and reported their perception in a 4AFC task. Biases in the perception of this stimulus are shown to depend upon the stimulus configuration in its entirety, and cannot be accounted for by low-level preferences for disparity sign. Such results are not consistent with local smoothing effects arising solely at the level of cross-correlation-like absolute disparity detectors. Instead, our results suggest the presence of smoothing constraints that consider the differences in disparity between neighboring image regions. These results further suggest that such smoothing generally biases matching toward solutions that minimize relative disparity, regardless of the presence of changes in disparity sign.

*continuity principle*(Marr & Poggio, 1976, 1979). This continuity principle states that disparity should vary smoothly across the image since “… matter is cohesive, it is separated into objects, and the surfaces of objects are generally smooth” (Marr, 1982, p. 113).

*a*and

*b*. Random-dot patterns were binary with a 1:1 ratio of black to white pixels. Each unit extended to the full height of the stimulus strip. Unit width varied between strips, depending on the required disparity magnitude (see below), but was always constant within each strip. In one eye, repeated sequences were paired in the order

*ab,*while in the other eye, the repeated order was

*ba*(see Figure 2a). This reversal of sequence order produces a reliable matching ambiguity in the stimulus, where an

*ab*pair in one eye may match to an

*ab*pair with either crossed or uncrossed disparity, of equal magnitude, in the other eye.

*ab*pair (i.e. equal to the size of a single

*a*or

*b*unit). Unit width, and therefore disparity magnitude, increased with the strip's vertical eccentricity. Flanking strips closest to the central strip consisted of

*ab*pairing of width 8.4 arcmin. The next most eccentric strips consisted

*ab*pairings of width 16.8 arcmin. Finally, the most eccentric strips consisted of

*ab*parings of width 25.2 arcmin.

*ab*pairings thus varied between 4 × 10 and 12 × 10 pixels in size. The central strip was non-periodic (i.e. was composed of a single extended random-dot pattern, not a repeating sequence of

*ab*pairings), with no experimentally introduced disparity.

*ab*width and eccentricity, disparity magnitude in the stimulus increases from zero in the central strip, to ±4.2, ±8.4 and ±12.6 arcmin with increasing strip eccentricity. This results in four possible global matching solutions. These four alternative stimulus interpretations are illustrated in Figure 1b. Alternative stimulus interpretations fall into two main categories: observers perceive the stimulus in either a ‘wedge’ or ‘slanted plane’ configuration. In the ‘wedge’ category each strip is matched in the same direction, while in the ‘slanted plane’ category the direction of matching reverses either side of the central, zero disparity strip. These two main categories can then be subdivided into two further categories, depending on matching directions. Wedges are perceived as either convex or concave, while slanted planes are either top-far, ‘ground’ planes, or top-near, ‘ceiling’ planes.

*ab*pairings (Figures 2b and 2c). If an increase in mean luminance is applied to alternate sets of

*ab*pairings in both eyes, with a decrease in mean luminance applied to neighboring

*ab*pairings, one would expect matching to be biased toward the solution that maximizes luminance similarity between the two eyes (Goutcher & Mamassian, 2005). The direction of the preferred match would be dependent upon the phase of luminance modulation applied to each stimulus strip, in each eye. The magnitude of luminance biasing would be dependent upon the amplitude of the luminance modulation. Figure 2b illustrates an example of a luminance modulation applied in the uncrossed disparity direction. As the reader will note, here

*ab*pairings in the left eye (upper row) match with right eye (lower row)

*ab*pairings shifted to the right (i.e. an outward shift). This leaves a single unmatched

*b*unit on the left of the right half image, and a single unmatched

*b*unit on the right of the left half image. Figure 2c illustrates an example of a luminance modulation applied in the crossed disparity direction. Here

*ab*pairings in the left eye match with

*ab*pairings in the right eye shifted to the left (i.e. an inward shift). In this case a single

*a*unit is left unmatched on the left of the left half image and on the right of the right half image. In both crossed and uncrossed disparity cases, unmatched regions conform to the expected monocular regions arising in the event of half-occlusions (Shimojo & Nakayama, 1990).

*ab*pairing is 11.7 cdm

^{−2}. For a luminance ambiguity of ±1, the mean luminance of the

*ab*pairings will be either 6.6 cdm

^{−2}or 16.8 cdm

^{−2}.

*I*

_{ L }and

*I*

_{ R }is first windowed by Gaussian envelopes

*W*

_{ L }and

*W*

_{ R }, defined, as:

*σ*is the standard deviation of the Gaussian, and determines the size of the window.

*x*

_{0}and

*δ*determine the horizontal location of each envelope, and

*y*

_{0}determines its vertical location.

*δ*determines the disparity in the position of the envelope between the two eyes. Given these envelopes, we can define local windowed image regions

*R*

_{ L }and

*R*

_{ R }as the product of the images

*I*

_{ L }and

*I*

_{ R }and the Gaussian windowing functions, centered on the image coordinates

**x**

_{ L }and

**X**

_{ R }where:

*R*

_{ L }and

*R*

_{ R }, the windowed local image regions in left and right half images:

*δ*is then calculated using the following equation:

*μ*

_{ L,R }is the mean intensity of the image region.

^{−2}. A correspondence solution was then chosen by taking the sum of the product of the cross-correlation output with each of the four templates, and selecting the one that gave the greatest response. The templates corresponded to the four interpretations of the stimulus (ground plane, ceiling plane, convex wedge, concave wedge), having a value of one at the appropriate disparities, and zero everywhere else.

*b*in half of the stimulus, it will now contain disparities equal to ±1/

*b*of the original wavelength. So, either side of fixation, we now have disparities equal to 1/

*b*and 1 original wavelength, plus all multiples of these, although only disparities of ±1/

*b*will tend to be perceived, consistent with a preference for small disparities (Banks & Vlaskamp, 2009; Prince & Eagle, 2000; Qian & Zhu, 1997; Read, 2002). This means that, no matter the point in the image, a local cross-correlator will always have the same information telling it that the disparity is in front of fixation as it does that the disparity is behind fixation. Note that this is the case no matter the base spatial frequency of the stimulus, or the magnitude of the change in spatial frequency, provided that the anti-phase relationship between eyes is maintained, and that changes in spatial frequency only occur vertically. A change in spatial frequency along the horizontal will result in contraction and expansion disparities that will reduce the response of a cross-correlator. The stimulus used in this paper is a specific example of this general case.

*a*’, given the perception of either a ‘ground’ or ‘ceiling’ surface in the other half of the stimulus ‘

*b*’. Conditional probabilities were defined as follows:

_{response}indicates the number of responses of a particular type (ground, ceiling, convex or concave) in the 4AFC task. The first response proportion term in each of these equations gives the conditional probability for perceiving a ‘ground’ surface in the top half of the stimulus, while the second response proportion term gives the conditional probability for perceiving a ‘ground’ surface in the bottom half of the stimulus.

*p*(

*Ground*

_{ a }∣

*Ceiling*

_{ b }) is greater than the conditional probability

*p*(

*Ground*

_{ a }∣

*Ground*

_{ b }), then matching is biased toward wedge solutions, and therefore toward the minimization of relative disparity. If the opposite relationship holds, matching is biased toward slanted plane solutions, and therefore toward the minimization of changes in disparity gradient sign. Note that the complementary conditional probabilities

*p*(

*Ceiling*

_{ a }∣

*Ground*

_{ b }) and

*p*(

*Ceiling*

_{ a }∣

*Ceiling*

_{ b }) may be readily derived from those reported.

*p*(

*Ground*

_{ a }∣

*Ceiling*

_{ b }) than for the conditional probability

*p*(

*Ground*

_{ a }∣

*Ground*

_{ b }) then, given equal luminance ambiguity values, the observer is more likely to perceive a wedge shape than a slanted surface, indicating a preference for the minimization of relative disparity over the minimization of changes in disparity gradient sign. This observer shows a marked preference for the minimization of relative disparity. Together, these results show that all 10 observers showed a significant matching bias. For 8 of the observers, this bias was in favor of wedge interpretations, while the remaining 2 showed a significant bias in favor of planar interpretations. Overall, therefore, there was a significant tendency for observers to show a bias consistent with minimizing relative disparity. That the results for the 2 observers who did not follow this trend were significant, however, shows that these reflect genuine individual differences, rather than simple sampling error.

*p*(

*Ground*

_{ a }∣

*Ceiling*

_{ b }) and

*p*(

*Ground*

_{ a }∣

*Ground*

_{ b }) PSEs for each of the ten observers. Negative values indicate a preference for matching to wedge shapes, while positive values indicate a preference for matching to slanted surfaces. Error bars show bootstrapped 95% confidence intervals. Eight of the ten observers show significant negative differences, while the remaining two observers show significant positive differences. Two of the eight observers with significant negative differences show very little effect of luminance ambiguity variation on their perception of the stimulus. Instead, psychometric functions for these observers show ceiling and floor performance. While this does indeed show a strong bias in the direction specified by a calculation of the difference between conditional probability PSEs, quantifying this difference in terms of extrapolated threshold values does not seem satisfactory. We therefore also offer an alternative means of analyzing differences between the two conditional probability measures.

*p*(

*Ground*

_{ a }∣

*Ceiling*

_{ b }) −

*p*(

*Ground*

_{ a }∣

*Ground*

_{ b }) when luminance ambiguity is equal to zero (i.e. when the stimulus is objectively ambiguous). The results of this analysis are shown for each observer in Figure 6c. In this case, negative values indicate a preference for matching to slanted surfaces, while positive values indicate a preference for matching to wedge shapes. Using this analysis, six of the ten observers show a significant positive difference, while two observers have a significant negative difference. These results concur with those of the threshold analysis. The remaining observers do not show significant differences for the zero luminance ambiguity analysis, although their biases were significant in the threshold analysis. A related samples

*t*-test on the zero luminance ambiguity data shows a significant positive difference between conditional probabilities (

*t*

_{9}= 2.5574,

*p*< 0.05) across all participants, indicating a significant bias for the minimization of relative disparity.

*P*

_{ response }is the predicted probability of the response arising from data in the 2AFC task, and

*S*

_{ response }is the probability of a response arising from the Monte Carlo simulations. The variability of prediction errors arising from the simulations was compared to the ‘error’ observed in the 4AFC task. This error

*E*

_{ observed }was defined in the same way as the simulation error

*E*

_{ simulated }, replacing the

*S*

_{ response }terms with the relevant response probabilities observed in the 4AFC experiment.

*E*

_{ simulated }error as a proportion of

*E*

_{ observed }as follows:

*E*

_{ proportion }of varying magnitudes. The probability of

*E*

_{ proportion }having a value greater than or equal to 1 (i.e. of obtaining a value of

*E*

_{ simulated }greater than or equal to

*E*

_{ observed }) was less than 0.01 for nine of the ten observers. For the remaining observer (Obs. 3), the probability of obtaining a proportional error greater than or equal to 1 was 0.068. This result is shown in the red lines in each plot on Figure 8b, which indicate the value of required to equal 95% of the errors found in our simulations. This value is less than one for nine of the ten observers.