How the visual system learns the statistical regularities (e.g., symmetry) needed to interpret pictorial cues to depth is one of the outstanding questions in perceptual science. We test the hypothesis that the visual system can adapt its model of the statistics of planar figures for estimating three-dimensional surface orientation. In particular, we test whether subjects, when placed in an environment containing a large proportion of randomly shaped ellipses, learn to give less weight to a prior bias to interpret ellipses as slanted circles when making slant judgments of stereoscopically viewed ellipses. In a first experiment, subjects placed a cylinder onto a stereoscopically viewed, slanted, elliptical surface. In this experiment, subjects received full haptic feedback about the true orientation of the surface at the end of the movement. When test stimuli containing small conflicts between the circle interpretation as figure and the slant suggested by stereoscopic disparities were intermixed with stereoscopically viewed circles, subjects gave the same weight to the circle interpretation over the course of five daily sessions. When the same test stimuli were intermixed with stereoscopic views of randomly shaped ellipses, however, subjects gave progressively lower weights to the circle interpretation of test stimuli over five daily sessions. In a second experiment, subjects showed the same effect when they made perceptual judgments of slant without receiving feedback, showing that feedback is not required for learning. We describe a Bayesian model for combining multiple visual cues to adapt the priors underlying pictorial depth cues that qualitatively accounts for the observed behavior.

*Test*stimuli were images of ellipses generated so that the foreshortening cues suggested a slant that differed by 5° from the slant suggested by the stereoscopic cues. Both the foreshortening and stereoscopic cues in these stimuli suggested a tilt of 90°; that is, the figures appeared to be rotated around a horizontal axis. Cue conflicts were generated by first projecting a circle filled with an isotropic Voronoi texture at the slant specified for the foreshortening cues into the image plane corresponding to a cyclopean view midway between a subjects' two eyes and then back-projecting the resulting image onto a surface at the slant specified by the stereoscopic cues. Stereoscopic projection of the resulting figure and texture created a stimulus whose monocular foreshortening cues were consistent with one slant and whose disparities were consistent with another slant. Subjects' performance on the test stimuli (how they oriented the cylinder when placing it on the surface) provided the data for computing cue weights in a session. Test stimuli were generated around both 25° and 35° and consisted of the following slant pairs: [(20°, 25°), (25°, 20°), (30°, 25°), (25°, 30°)] and [(30°, 35°), (35°, 30°), (40°, 35°), (35°, 40°)].

*Nontest*stimuli were presented at slants of 20°, 25°, 30°, 35°, 40°, and 45°, also slanted around a horizontal axis. Nontest stimuli were circles filled with isotropic Voronoi textures, so that the foreshortening and stereoscopic cues were consistent. In the first session of the experiment, the orientation of the physical target surface positioned by the robot arm was set to a slant randomly chosen from the range defined by the two cues; thus, it was not more or less consistent with either the foreshortening or the stereoscopic slant cues. Data from the first session were used to derive baseline estimates of cue weights. The following four sessions were “training” sessions, in which the physical target surface was always positioned at the slant suggested by the stereoscopic cues. The haptic feedback that subjects received at the end of their movements in the training sessions was perfectly correlated with the stereoscopic cues in the stimuli but more weakly correlated with the foreshortening cues provided by the figure's shape and the texture pattern.

*σ*

_{compression}is the slant suggested by the foreshortening cue on that trial,

*σ*

_{stereo}is the slant suggested by stereoscopic cues on that trial, and

*w*

_{compression}is the normalized weight that subjects give to the foreshortening cue relative to the stereoscopic cues.

*α*and

*β*are constants that capture multiplicative and constant biases in subjects' slant settings.

*t*(5) = 1.3,

*p*= .25. Subjects' performance showed little variable error. Figure 4B shows the average standard deviations in contact slants for the cue-conflict stimuli used to calculate cue weights, as a function of session number. The results show that subjects performed the task with very high accuracy.

*F*(1, 49) = 19.78,

*p*< .001, and subject,

*F*(5, 49) = 22.48,

*p*< .001, had main effects on cue weights. The effect of session was not significant,

*F*(4, 49) = 1.71,

*p*> .16. The effect of test slant was as expected from previous studies on similar or equivalent cues—subjects gave more weight to foreshortening cues at high (35°) slants than at low (25°) slants (Hillis, Watt, Landy, & Banks, 2004; Knill, 2005; Knill & Saunders, 2003).

*t*(7) = 1.18,

*p*= .28. Figure 6B shows the average standard deviation of subjects' contact slants for the cue-conflict stimuli. In Experiment 2, subjects showed somewhat more variable error in performance in the early sessions than in Experiment 1, but subjects' asymptotic variable error was very similar in the two experiments. The results, like those of Experiment 1, show high accuracy for performing the motor task.

*w*(

*t*) is a subjects' foreshortening cue weight computed from the data in session

*t,*

*w*

_{0}is the foreshortening cue weight after the first baseline session (

*t*= 0), and

*k*quantifies the rate of adaptation. The solid curves in Figure 7A are exponential functions parameterized by the average of

*w*

_{0}and

*k*across the subjects in each experiment. Figure 7B shows the average rate of adaptation,

*k,*for each experiment. The rate of adaptation was significantly greater in Experiment 2 than in Experiment 1,

*t*(12) = 4.83,

*p*< .001.

*SD*> 6.2). No such outlier subjects were apparent in the data from Experiment 1.

*θ*< 30°, where

*θ*is the angle between the surface normal and the probe. In both experiments, test stimuli contained cue conflicts around 35° and consisted of the following slant pairs: [(30°, 35°), (35°, 30°), (40°, 35°), (35°, 40°), (35°, 35°)]. As in Experiments 1 and 2, other “training” stimuli were presented at slants of 15°, 20°, 25°, 30°, 35°, 40°, and 45°. In the first sessions of both Experiments 3 and 4, the training stimuli were all circles. In the following four sessions in Experiment 3, they remained circles. In Experiment 4, however, the training stimuli in the last four sessions were ellipses with aspect ratios drawn from a uniform distribution between 0.5 and 1 and with random orientations in the plane ( Experiment 2). The textures used in Experiments 3 and 4 were the same as those used in Experiment 2.

*t*(7) = 3.45,

*p*< .01; Experiment 4:

*t*(7) = 5.05,

*p*< .002. Figures 9C and 9D show the average standard deviation of subjects' contact slants for the cue-conflict stimuli in the two experiments. Subjects' variable error in the matching task was higher by approximately 75% than their variable error in the motor task. Average tilt estimates for the cue-conflict stimuli were 90.63 and 89.87 in the two experiments, with average standard deviations of 3.05 and 3.23 in the two experiments, respectively.

*t*test),

*t*(14) = 2.76,

*p*< .02. Figures 10C and 10D show the learning rates fit to each subject's data in the two experiments.

*A,*with the form

*λ*is the proportion of ellipses in the world that are noncircular. ∂(

*A*− 1) is a Dirac function that takes the value 0 for

*A*≠ 1.

*σ*

_{ A}is the standard deviation of the log aspect ratio for noncircular ellipses. In all of the simulations, we set

*σ*

_{ A}= 0.5, which created a distribution that dropped off sharply at

*A*= 0.5 and

*A*= 2.0, approximating the uniform distribution of aspect ratios used in the learning experiments. The mixing parameter,

*λ,*is the only free parameter in the prior distribution. Learning occurs by adjusting an estimate of

*λ*on each trial.

*t,*is given by a deterministic function of the scene with some corrupting additive noise

*I*represents the image measurements,

*X*represents the parameters describing a scene (at least all parameters that influence

*I*), and

*ω*is a random variable representing noise in the sensory system. An optimal Bayesian observer bases its estimate of

*X*on the posterior distribution

*k*is a constant that normalizes the distribution.

*p*

_{ t}(

*X*) is the current model of the prior distribution on the scene. This is updated from trial to trial in the learning step described below. Assuming that the prior has a parametric form parameterized by a set of parameters

*λ,*we can approximate the posterior using

*λ*.

*X*represents the slant and aspect ratio of the ellipse in the world. We will therefore replace

*X*with two variables for slant and aspect ratio—

*S*and

*A*. Assuming that the prior on slant is fixed (and broad), we can write Equation A4 as

*α*represents the aspect ratio of the ellipse in the retinal image and

*A,*

*p*(

*A*;

*σ*

_{ α}is the standard deviation of the noise on the sensory measurements of

*α*. We treated the stereoscopic cues as providing an unbiased estimate of slant corrupted by Gaussian noise; thus, we can write the likelihood function for stereo as

*σ*

_{stereo}.

*λ*= 0.01 and was updated using the learning procedure described below. The noise parameters for the simulations were chosen to be consistent with subjects' performance prior to learning. Subjects performed the motor task used in Experiments 3 and 4 with an average standard deviation of 2.5° in the slant of the cylinder at contact (and no significant bias) and gave approximately equal weights to the foreshortening and stereo cues for surfaces at the test slant of 35°. The model performed equivalently (with

*λ*= 0.01) when the noise parameter for the aspect ratio measurement was set to

*σ*

_{ ω}= 0.025 and the noise parameter for slant estimates from stereo was set to

*σ*

_{ ω}= 3.5°. The noise parameter on aspect ratio measurements was remarkably close to measurements of human subjects' thresholds for discriminating aspect ratio (Regan & Hamstra, 1992). We used these noise parameters throughout all simulations.

*λ,*in Equation A1 from trial to trial. The model assumes that

*λ*can change slowly over time according to the equation

*η*

_{ t}is a white noise process with standard deviation

*σ*

_{ η}. The slant estimator described above requires an estimate of

*λ*

_{ t}. To do this, the model updates an internal model of

*p*(

*λ*

_{ t}∣

*I*

_{ t−1},

*I*

_{ t−2}, …,

*I*

_{1}) on each trial based on the information in the image on that trial,

*I*

_{ t}. For simplicity, we will write this as

*p*

_{ t∣ t−1}(

*λ*

_{ t}). From Equation A10, this can be written as

*N*(0,

*σ*

_{ η}) is a normal distribution with a mean equal to 0 and a standard deviation equal to

*σ*

_{ η}. The distribution

*p*

_{ t−1∣ t−1}(

*λ*

_{ t−1}) is updated according to the equation

*p*(

*I*

_{ t−1}∣

*λ*

_{ t−1}) is the likelihood of seeing the image

*I*

_{ t−1}, given a prior on aspect ratios parameterized by

*λ*

_{ t−1}. This is given by

*p*

_{ t∣ t−1}(

*λ*

_{ t}). We use the mode of

*p*

_{ t∣ t−1}(

*λ*

_{ t}) as the estimate of

*λ*to use for estimating slant on each trial. The only free parameters in the model are the standard deviation of the white noise process assumed for

*λ,*which we set to 0.01, and the initial distribution

*p*

_{1∣0}(

*λ*

_{1}), which we set to be a normal with a mean equal to 0.01 (an initial estimate that 99% of ellipses are circular) and a standard deviation equal to 0.001. The behavior of the model is largely indifferent to these parameters, although changing the initial standard deviation for the prior on

*λ*or the standard deviation of the random walk by large amounts affects the learning rate (e.g., setting the standard deviation of

*p*

_{1∣0}(

*λ*

_{1}) to 0 turns off learning).