The informativeness of sensory cues depends critically on statistical regularities in the environment. However, statistical regularities vary between different object categories and environments. We asked whether and how the brain changes the prior assumptions about scene statistics used to interpret visual depth cues when stimulus statistics change. Subjects judged the slants of stereoscopically presented figures by adjusting a virtual probe perpendicular to the surface. In addition to stereoscopic disparities, the aspect ratio of the stimulus in the image provided a “figural compression” cue to slant, whose reliability depends on the distribution of aspect ratios in the world. As we manipulated this distribution from regular to random and back again, subjects' reliance on the compression cue relative to stereoscopic cues changed accordingly. When we randomly interleaved stimuli from shape categories (ellipses and diamonds) with different statistics, subjects gave less weight to the compression cue for figures from the category with more random aspect ratios. Our results demonstrate that relative cue weights vary rapidly as a function of recently experienced stimulus statistics and that the brain can use different statistical models for different object categories. We show that subjects' behavior is consistent with that of a broad class of Bayesian learning models.

- Can the brain adapt different internal statistical models of figure shape for different figure categories and effectively switch between them when interpreting stimuli drawn from the different categories?
- Are there limits to the categorical dimensions that support this kind of model switching?
- How rapidly are internal models adjusted to match changes in environmental statistics?

*disparity cue,*was the gradient of stereoscopic disparities across the surface. The second cue was provided by the shape of the figure as projected onto the subject's retinas. Because there is a systematic relationship between the “true” aspect ratio of a figure in the world, the figure's 3D orientation relative to the viewer, and the aspect ratio of the image of the figure as projected onto the viewer's retinas, the 3D orientation of the figure can be inferred from the image aspect ratio, provided that the true aspect ratio is known or assumed. (For details, refer to Equation A3 in 1.) For example, if a coin (which is known to have a true aspect ratio of 1) projects to an ellipse with an aspect ratio of 0.7 on the observer's retina, the observer can infer that the coin is slanted by about 45.6°. Humans have an “isotropy bias”—they tend to assume that the true aspect ratio is equal to 1, and that the apparent compression of the figure is a consequence of its being slanted. We thus refer to the image aspect ratio of a figure, interpreted under the assumption that the true aspect ratio is equal to 1, as the

*compression cue*.

*Test stimuli*were used to measure the influence of the compression cue (relative to the disparity cue) on subjects' judgments. Test stimuli were constructed to have conflicts of −5°, 0°, or 5° between the slant suggested by the disparity cue and the slant suggested by the compression cue. Slant was defined relative to the viewer such that stimuli with a slant of 0° would be frontoparallel, and stimuli were slanted about a roughly horizontal axis. Given the viewing geometry, with subjects' heads pointed down at approximately 50° (see Figure 1), stimuli with a slant of approximately 40° appeared parallel to the ground. One of the cues always suggested a slant of 35°. The resulting pairs of slant suggested by the two cues were [35°, 30°; 30°, 35°; 35°, 35°; 35°, 40°; 40°, 35°]. To create conflicts, circles and square diamonds were distorted such that when projected from the slant specified for the disparity cue, they projected to the figure shape in an imaginary cyclopean eye midway between a subject's two eyes that a circle of square diamond would have projected to if it were slanted at the angle specified by the compression cue slant. Thus, for example, an elliptical stimulus with a stereoscopic slant of 35° and a compression cue slant of 40° would be an ellipse with an aspect ratio of 0.935 rendered stereoscopically at a slant of 35°. This resulted in a stimulus that appeared as an ellipse in the frontoparallel plane with an aspect ratio of 0.766 (consistent with a circle projected from 40° slant) and with stereoscopic disparities suggesting a 35° slant.

*Context stimuli*(together with the test stimuli) implicitly defined the statistics of the local stimulus environment. Either all context stimuli were slanted figures with aspect ratios of 1 (regular context), or they had random aspect ratios between 0.5 and 1 (random context). Note that what we refer to here as “context” includes not only other stimuli present at the same time as the test stimulus—in fact, such a local context was only provided in Experiment 4—but also the stimuli that temporally preceded the test stimulus. The area of all figures was held constant to match the area of the test stimuli containing no cue conflicts. Context stimuli were “spun” by a random angle between 0° and 180° and were then rotated around a roughly horizontal axis by a slant randomly chosen from a fixed set of slants (these varied slightly between experiments). Depending on the experiment, between 65.75% and 77.78% of the stimuli were context stimuli.

*random diamond group,*context stimuli in sessions 3–5 consisted of diamonds with random aspect ratios and circles. For the remaining subjects, who formed the

*random ellipse group,*context stimuli in sessions 3–5 consisted of ellipses with random aspect ratios and square diamonds. Context stimuli were randomly slanted at one of four angles away from the frontoparallel; 20°, 30°, 40°, and 50°.

*z*-axis connected the center of the surface with the cyclopean eye, and whose

*x*- and

*y*-axes spanned a locally frontoparallel plane perpendicular to the line of sight to the center of the figure. The

*x*-axis was defined as the projection of the line connecting the subject's two eyes onto the locally frontoparallel plane. The

*y*-axis was given by the cross-product of the

*x*- and

*z*-axes.

*regular context trials,*the context stimuli were slanted circles with a diameter of 5 cm, while in

*random context trials*they were slanted ellipses with random aspect ratios between 0.5 and 1 whose area was matched to that of the circles. Context stimuli were randomly spun in the plane prior to slanting. The slant of each context stimulus was chosen randomly from the set [25°, 30°, 35°, 40°, 45°]. Test stimuli were generated to have cue conflicts as described above. The axis about which a stimulus was slanted (often referred to as the tilt axis) was randomly drawn from a uniform distribution between −20° and 20° away from the horizontal. The location of the test stimuli in the set was randomly determined. To ensure that subjects attended to the context stimuli within a display, subjects made slant judgments for 5 randomly chosen context stimuli first, followed by a random combination of the remaining 2 test and 2 context stimuli.

*Experiment 1*was completed by 34 subjects, 4 of whom were excluded from the data analysis for the reasons mentioned above. Of the remaining 30 subjects, 15 were in the random diamond group and 15 were in the random ellipse group. They ranged in age from 18 to 36, and 19 of them were female.

*Experiment 2*had 15 participants, 8 of them female, who ranged in age from 18 to 26. In

*Experiment 3,*4 of 16 subjects had to be excluded from the analysis. The remaining 12 subjects ranged in age from 18 to 40, and 6 of them were women.

*Experiment 4*had 31 participants, 3 of whom (2 in the experimental group, 1 in the control group) were excluded from the analysis because of high standard errors. There remained 14 participants in the experimental group and 14 in the control group. Their ages were between 18 and 32, and 19 of them were women. In

*Experiment 5,*the data of 4 subjects had to be discarded because of high standard errors. The reported results are based on the data of 15 subjects (6 men) who were between 18 and 32 years old.

*S*

_{comp}is the slant suggested by the interpretation of the figure as having a true aspect ratio of 1 (the circle or square interpretation),

*S*

_{disp}is the slant suggested by the gradient of stereoscopic disparities across the surface, and the constants

*b*and

*c*capture multiplicative and additive biases in the subjective points of equality between surface and probe slants;

*w*

_{comp}is a measure of the relative influence of the compression cue on subjects' slant judgments. Fitting Equation 1 to subjects' judgments is algebraically equivalent to fitting a linear model

*y*=

*w*

_{1}

*x*

_{1}+

*w*

_{2}

*x*

_{2}+

*c*to the data and then normalizing the weights to sum to 1.

*w*

_{comp}, separately for each condition: We randomly sampled the experimental trials and ran the regression described above to compute an estimate for

*w*

_{comp}. This was repeated 1,000 times, and the standard deviation of the resampled estimates of

*w*

_{comp}was used as the standard error of the estimate. The data of subjects whose standard errors on

*w*

_{comp}exceeded those of the remaining subjects in the group by a factor larger than 3 (for the subjects where this occurred, it always occurred in multiple conditions) were excluded from the computation of the group means.

*c*in Equation 1) decreased by about 2°, but that the relative slopes of the two lines changed markedly between sessions 2 and 5 for the diamonds, indicating that the subject gave less weight to the compression cue for diamonds after exposure to a large set of randomly shaped diamond context stimuli.

*b*in Equation 1) was equal to 1.54 ± 0.09 (

*M*±

*SE*)—see 2 for more details. While this could reflect overall biases in the perceived orientation of surface, it could also reflect biases in the perceived orientation of the probe, or in what is perceived to be orthogonal orientations between the two—the data do not allow us to distinguish biases arising from subjects' slant percepts and those arising from the matching procedure.

*t*(14) = −0.041,

*p*= 0.968), whereas it significantly decreased for diamond test stimuli (

*t*(14) = 3.172,

*p*= 0.007). For the random ellipse group ( Figure 4B), the opposite pattern was observed; the influence of the compression cue did not change significantly for diamond stimuli (

*t*(14) = 0.749,

*p*= 0.440) but it decreased significantly for ellipse test stimuli (

*t*(14) = 4.645,

*p*< 0.001). In both groups, the influence of the compression cue changed significantly more for the shape category whose context stimuli were presented with random aspect ratios in sessions 3–5 (both

*t*(14) ≥ 3.116, both

*p*≤ 0.008).

*b,*in the regression model in Equation 1 between the first two sessions and the last two sessions, nor any interaction between session and the context associated with a stimulus shape (isotropic or random). Since the gain factor,

*b,*in the regression model is equal to the sum of weights derived from a simple linear regression (

*w*

_{comp}

*S*

_{comp}+

*w*

_{disp}

*S*

_{disp}+

*c*), we would expect differential changes in the perceptual scaling of the two cues to affect fitted values for

*b*as well as the normalized cue weights. Thus, the fact that there are no significant changes in

*b*is another reason to assume that the observed changes in cue weights were not caused by cue-specific changes in perceptual scaling.

*t*(14) = 4.176,

*p*= 0.001) and purple (

*t*(14) = 4.903,

*p*< 0.001) stimuli, although only the purple context stimuli had random aspect ratios. The changes were not significantly different for the two color categories (

*t*(14) = 1.153,

*p*= 0.268). Instead of occurring selectively for differently colored stimuli, adaptation generalized across colors.

*t*(11) = 9.758,

*p*< 0.001) and purple (

*t*(11) = 3.744,

*p*= 0.003) stimuli, and there was no significant difference between the decreases in the two categories (

*t*(11) = 1.651,

*p*= 0.127).

*t*(13) = 2.995,

*p*= 0.010), indicating an effect of the globally more random stimulus context in those later sessions, where regular context trials were randomly interleaved with random context trials. As expected, the relative influence of the compression cue was also affected by local stimulus context. It was significantly lower in random context trials than in regular context trials of sessions 2, 3, and 4 (

*t*(13) = 3.292,

*p*= 0.006). Significant changes happened based upon only one trial's worth of context stimuli, as evidenced by the fact that even if we discounted trials preceded by trials with the same stimulus context from the analysis, the relative influence of the compression cue still differed significantly (

*t*(13) = 2.565,

*p*= 0.023) between regular and random context trials of sessions 2, 3, and 4; the average influence of the compression cue in regular context trials preceded by one or more random context trials was 0.413 ± 0.047, whereas in random context trials preceded by one or more regular context trials, it was only 0.315 ± 0.036.

*w*

_{comp}: 0.002 ± 0.024,

*t*(13) = 0.081,

*p*(2-tailed) = 0.937); thus, the changes observed in the main experimental group were not simply due to repeated exposure to the experimental task.

*t*(14) = 2.420,

*p*= 0.030) lower in the regular context sequences of later sessions, and again significantly (

*t*(14) = 3.004,

*p*= 0.009) lower in the random context sequences ( Figure 7B). The result remained unchanged if we only looked at test stimuli preceded by a single sequence of “same”-context stimuli. The average influence of the compression cue in test stimuli preceded by a single sequence of regular context stimuli (itself preceded by one or more sequences of random context stimuli) was 0.397 ± 0.026, whereas the average influence of the compression cue in test stimuli preceded by one sequence of random context stimuli (itself preceded by one or more sequences of regular context stimuli) was 0.296 ± 0.037. These estimates differed significantly from one another (

*t*(14) = 2.641,

*p*= 0.019).

*p*

_{isotropic}, the estimate of the prior probability that a figure in the world is isotropic. Changes in this “mixture proportion” can have a significant effect on the influence of the compression cue on subjects' slant judgments. If the estimator assumes all figures are isotropic, the reliability of the compression cue (hence its perceptual weight) is determined entirely by noise on sensory estimates of the aspect ratio of a figure in the image. If the estimator assumes all figures are drawn from a random set of shapes, the reliability of the compression cue and its perceptual weight is determined by both the sensory noise and the assumed variability of aspect ratios in the world. Values in between give rise to weights in between those that would be found for an estimator assuming a purely isotropic model and those that would be found for an estimator assuming a purely random model. This is true even for figures that are close to being isotropic, for which the sensory data are reasonably consistent with an assumption of isotropy, because the estimator always takes into account the likelihood that the figure is not isotropic.

*σ*

_{α}= 0.024, an estimate derived from the data in Knill (2007a) but also well within the range estimated from shape discrimination data reported by Regan and Hamstra (1992). For the standard deviation of the noise on slant estimates from stereoscopic disparities, we set

*σ*

_{disp}= 3.5°, a value taken from estimates of the uncertainty in slant-from-disparity estimates (Hillis, Watt, Landy, & Banks, 2004). For the standard deviation of shapes assumed in the random ellipse prior model (for which we chose a log Gaussian, see 1), we set

*σ*

_{A}= 0.12 (Knill, 2007a). For the model-switching form of the adaptive mechanism, we assumed the least constrained form of the model possible—that when a change in environmental statistics occurs, the mixing proportion can change to any value between 0 and 1 (with a uniform prior). For this model, the only parameter left free to fit the data was the probability that the mixture proportion changes to a new value with each stimulus presentation (

*p*

_{jump}). For the continuous adaptation model, the only parameter left free to fit the data was the standard deviation of the assumed random walk process on the mixture proportion,

*σ*

_{jump}. More details about the learning models and the simulations can be found in 1.

*σ*

_{jump}for the continuous adaptation model fit the data reasonably well. Values less than 0.025 caused the difference between slant judgments for test stimuli in regular and random contexts to disappear, and values greater than 0.075 caused the difference between slant judgments for test stimuli in regular contexts and in the baseline session to disappear. The switching model was much more resilient to changes in

*p*

_{jump}. Values ranging from 1/25 to 1/500 gave rise to very similar behavior. The one significant difference between the performance of the two forms of Bayesian estimators is that the relative influence of the compression cue (the isotropy bias) after exposure to a small number of random shapes is lower under the switching model than under the continuous adaptation model. Simulations show that this difference disappears after exposure to a larger number of random shapes (both models show an asymptotic value of 0.23 for the average compression cue weight). The difference in behavior for short sequences of stimuli as used in Experiments 4 and 5 arises from the hysteresis in the continuous adaptation model. That model's estimates of the mixture proportion are pulled slowly away from the current estimate by new evidence, while the model-switching mechanism allows arbitrary changes in the mixture proportion when a “jump” occurs.

*S*appear as ellipses with aspect ratios approximately equal to cos

*S,*with a long axis of symmetry perpendicular to the direction of surface tilt. Squares whose axes of diagonal symmetry are pre-aligned with the tilt axis (as in our “diamond” test stimuli) project to figures with an axis of symmetry equal to the tilt direction and with an aspect ratio that also approximately equals to cos

*S*(where aspect ratio refers to the aspect ratio of the best fitting ellipses). Despite the perspective distortion in the projected image of the diamonds, the cosine approximation of the distortion in the best fitting ellipse to the figure is very accurate.

^{−1}

*α*(where

*α*is the aspect ratio of the figure's best fitting ellipse). When the aspect ratio distribution is broad, the information is unreliable, no matter how well the visual system can code the shapes of figures on the retina. An optimal estimator of surface orientation from the combination of compression cues and stereoscopic disparity cues bases its estimate on a posterior probability density function on surface orientation, conditioned on the measured aspect ratio and orientation of a figure on the retina and the measured stereoscopic disparities. The shape of this probability density function (e.g., its mode) depends critically on the probability density function on figure aspect ratios.

*S*is the slant of a figure,

_{t}*α*

_{ t}is the observed aspect ratio of the figure projected on the retina, and

_{ t}is a vector of measured disparities (

*t*indexes the stimulus set in order of stimulus presentations). The three terms on the right-hand side of Equation A1 represent the information provided by each of three sources—the two sensory cues and prior knowledge about the statistics of surface slant.

*p*(

*α*

_{ t}∣

*S*

_{ t}) is the likelihood of measuring aspect ratio

*α*

_{ t}from a figure with slant

*S*

_{ t},

*p*(

_{ t}∣

*S*

_{ t}) is the likelihood of measuring the disparities

_{ t}from a surface with slant

*S*

_{ t}, and

*p*(

*S*

_{ t}) is the prior probability of viewing a surface with slant

*S*

_{ t}. Assuming a uniform prior on tilt, the prior density function on slant should be

*p*(

*S*

_{ t}) = cos

*S*

_{ t}; however, this is so broad relative to the two likelihood functions that a model that uses a uniform prior on

*S*

_{ t}is essentially equivalent. Since it simplifies notation, we will assume a uniform prior on slant, and since an estimator only uses proportional probabilities (note the proportion sign in Equation A1) rather than absolute probabilities, we can remove the prior term from Equation A1, giving the simplified form

_{ t}∣

_{ t})

*A*

_{ t}represents the aspect ratio of the figure in the world and

*N*

_{ t}represents sensory noise. Assuming zero-mean Gaussian noise on aspect ratio measurements, we can write a likelihood function for

*α*

_{ t}conditioned on both slant and aspect ratio as

*σ*

_{ α}is the standard deviation of the sensory noise on aspect ratio judgments. The likelihood function for aspect ratio conditioned on slant alone is given by marginalizing over all possible aspect ratios in the world, giving

*λ*

_{ t}, and figures with random aspect ratios are drawn from a smooth probability density function on

*A*. We chose a log-Gaussian density function for the random model because it is a smooth density function that is invariant to whether one uses aspect ratios greater than or less than 1 to parameterize shape. The resulting prior on shape has the form

*σ*

_{ A}determines the standard deviation of aspect ratios of shapes drawn from the random model. Note that according to this model, the shapes drawn from the random class of figures are still biased toward an aspect ratio of 1.

*δ*(

*A*

_{ t}− 1) is a Dirac delta function that concentrates all of the probability at

*A*

_{ t}= 1. It represents the probability distribution on aspect ratio for isotropic figures. The likelihood function on aspect ratio then becomes

*A*

_{ t}other than one, it is easily evaluated by setting

*A*

_{ t}= 1, and Equation A7 becomes

*S*

_{ t}= cos

^{−1}

*α*

_{ t}(the slant inferred from the aspect ratio in the image under the assumption that the aspect ratio of the figure in the world equals one) and has a standard deviation determined by the noise standard deviation

*σ*

_{ α}. The second term will also peak near

*S*

_{ t}= cos

^{−1}

*α*

_{ t}but will have a standard deviation greater than

*σ*

_{ α}, with the difference determined by the standard deviation of the prior distribution of aspect ratios in the random class of figures in environment,

*σ*

_{ A}.

_{ t}∣

_{ t})

*σ*

_{disp}is the standard deviation of slant-from-disparity estimates, and

_{ t}

^{disp}is the mode of the likelihood function. This finesses the problem of building a stereoscopic model for slant, assuming that the likelihood function for slant from disparities on any given trial is Gaussian around some modal slant. The mean slant varies form trial to trial by the same standard deviation as the standard deviation of the likelihood function. Noise in disparity measurements and the computation of slant from disparities are reflected in both the random variations in the modal slant from trial to trial and in the standard deviation of the likelihood function. This is, for example, the appropriate model for a stereoscopic system that that can be modeled as generating slant estimates perturbed by Gaussian noise with standard deviation

*σ*

_{disp}. For the simulations of the learning model, we sampled values of

_{ t}

^{disp}from a Gaussian distribution with mean equal to the true slant of the stimulus and a standard deviation

*σ*

_{disp}, which we assume to be independent of base slant.

*λ*

_{ t}(which is estimated by the model online from the sequence of stimuli presented to the observer) based on results from previous studies. Sensory noise parameters were chosen to be consistent with the findings of previous psychophysical studies of aspect ratio discrimination (Regan & Hamstra, 1992) and stereoscopic slant discrimination (Hillis et al., 2004). The parameters used for our simulation were

*σ*

_{α}= 0.024 and

*σ*

_{disp}= 3.5° (note that the slant discrimination data of Hillis et al. suggests that

*σ*

_{disp}shrinks slightly with increasing slant; however, the changes expected over the 5° range of conflicts are very small). The standard deviation of the log-Gaussian prior on the aspect ratios of anisotropic figures was set to 0.12 based on fits of the Bayesian estimator to data from a previous study of robust cue integration for disparities and aspect ratio (Knill, 2007a). These parameters resulted in an estimator whose slant estimates were approximately equally influenced by compression and disparity cues (prior to adaptation—see below). They remained fixed for all simulations and were not fit to subjects' data.

*λ*given qualitatively different assumptions about how

*λ*changes over time in the environment; that is, how the proportion of isotropic figures in the environment changes over time. The adaptation models couple the Bayesian estimator of slant with an online estimate of

*λ*based on the history of stimuli viewed by the observer. Both models assumed that

*λ*could change with each stimulus presentation. We therefore use the notation

*λ*

_{ t}to represent the true mixture proportion in the stimulus set at time

*t,*where time is parameterized by discrete changes in stimuli attended to (trials in Experiments 1, 2, 3, and 5; the sequence of stimuli subjects made slant settings for in Experiment 4). The models differed only in the stochastic dynamics assumed for how

*λ*

_{ t}changes over time.

*λ*

_{ t}is not, therefore, a fixed parameter of the estimation model but rather a random variable itself that the model estimates on each trial.

*λ*derived from stimulus data; that is, the aspect ratio and slant-from-disparity measurements obtained from stimuli on each trial. Since both of our models assume that

*λ*

_{ t}depends on

*λ*

_{ t−1}, the observer's knowledge about

*λ*

_{ t}is represented by a posterior probability density function on

*λ*

_{ t}, conditioned on the entire history of stimulus data,

*p*(

*λ*

_{ t}∣

*α*

_{ t},

_{ t}, {

*α*

_{ t−1},

_{ t−1}, ⋯,

*α*

_{1},

_{1}}). Since the slant estimator depends on

*λ*

_{ t}, knowledge about which depends on the entire stimulus history, the posterior density function on slant, given the available sensory information, has to be rewritten as

*λ*

_{ t}. Equation A10 simply expresses the fact that the posterior on slant is the average of the posteriors computed for all possible values of

*λ*

_{ t}, weighted by the posterior probability density function for

*λ*

_{ t}, conditioned on all of the sensory measurements observed up to and including time

*t*. Note that the estimator does not use a discrete estimate of

*λ*

_{ t}at each time step to parameterize the slant-from-compression/disparity estimator (a suboptimal thing to do). Rather, it maintains and updates an internal model of the probability density function on

*λ*

_{ t}conditioned on all of the sensory information received to date. The adaptive models determine the evolution of

*p*(

*λ*

_{ t}∣

*α*

_{ t},

_{ t}, {

*α*

_{ t−1},

_{ t−1}, ⋯,

*α*

_{1},

_{1}}) over successive stimulus presentations

*t*. For notational simplicity, we will use

*X*

_{ t}= {

*α*

_{ t},

_{ t},

*α*

_{ t−1},

_{ t−1}, ⋯,

*α*

_{1},

_{1}} to represent the history of sensory data from stimulus presentation

*t*back to the first stimulus observed by a subject, so we are interested in deriving recursive update equations that relate

*p*(

*λ*

_{ t}∣

*X*

_{ t}) to the stimulus data at time

*t,*{

*α*

_{ t},

_{ t}}, and the previous density function

*p*(

*λ*

_{ t−1}∣

*X*

_{ t−1}).

*λ*

_{ t}. The first assumes that

*λ*

_{ t}changes to a new random value at discrete points in time and that the new value is independent of the previous value. We refer to this as the “switching model”. The second assumes that

*λ*

_{ t}follows a random walk in the environment. Since this model leads to slow, continuous changes in internal estimates of

*λ*

_{ t}from trial to trial, we refer to it as the “continuous adaptation model”.

*η*(

*p*

_{jump}) is a binomial process that takes the value 1 with probability

*p*

_{jump}and the value 0 with probability 1 −

*p*

_{jump}and

*ψ*

_{ t}takes on random values drawn from a uniform distribution between 0 and 1.

*p*

_{jump}determines the frequency with which

*λ*

_{ t}changes. When it changes, it is assumed to change to a random value between 0 and 1. Unfortunately, while the dynamics are Markovian, Equation A11 does not itself lead to simple recursive update equations for

*p*(

*λ*

_{ t}∣

*X*

_{ t}). This is because the probability that a change, or jump, in

*λ*occurred between time

*t*− 1 and

*t*depends not only on the stimulus data at time

*t*but also on the entire history of stimulus data.

*p*(

*λ*

_{ t}∣

*X*

_{ t}) as a recursive update equation, we used a method proposed by Adams and McKay (2007). We expand the state vector for the dynamical system represented by Equation A11 to include a variable

*h*

_{t}that represents the time (number of stimulus presentations) since the previous change in

*λ*

_{t}. The dynamics of

*h*

_{t}are given by the following conditional probabilities:

*h*

_{ t}is set to 0 every time there is a change in

*λ*(which occurs with probability

*p*

_{jump}), otherwise, it is incremented by 1. It cannot take on any other value (the third term in the expression). The posterior distribution on

*λ*

_{ t}is given by

*p*(

*λ*

_{ t},

*h*

_{ t}∣

*X*

_{ t}) as

*K*is a normalizing constant. Figure shape and disparity information at time

*t*is independent of the previous stimuli and of

*h*

_{ t}(once

*λ*

_{ t}is specified), so we can write Equation A14 as

*p*(

*α*

_{ t},

_{ t}∣

*λ*

_{ t}) is the likelihood of seeing the stimulus data at time

*t*in an environment with a probability density function on aspect ratios parameterized by

*λ*

_{ t}. It is given by

*t*averaged over all slants (assuming a uniform prior on slant). The second term in Equation A15 is given by the recursive relationship

*p*(

*λ*

_{ t},

*h*

_{ t}∣

*X*

_{ t}), with each stimulus. It then uses Equation A13 to update

*p*(

*λ*

_{ t}∣

*X*

_{ t}). This is what is needed by the estimator to estimate slant for that stimulus (see Equation A10). The estimator never actually computes an optimal value for

*λ*

_{ t}, as observers are never asked to do it. Rather, it continuously updates the posterior on

*λ*

_{ t}to be used in the slant estimator ( Equation A10).

*p*

_{jump}, the probability that the shape statistics in the environment have changed just before any given stimulus presentation.

*λ*

_{ t}follows a bounded random walk in the environment and uses the stimulus information at each stimulus presentation to update the current estimate of

*λ*

_{ t}. The dynamic model for

*λ*

_{ t}is represented by the discrete time update equation

*λ*is a random variable with a truncated zero-mean Gaussian density function, conditioned on

*λ*

_{ t},

*σ*

_{Δ λ}is the standard deviation of a mean zero, Gaussian random variable.

*λ*

_{ t}is bounded between 0 and 1, so the distribution on Δ

*λ*is a truncated Gaussian, with bounds dependent on

*λ*

_{ t}. For this model, a recursive update equation for

*p*(

*λ*

_{ t}∣

*X*

_{ t}) is easily obtained. The posterior distribution on

*λ*

_{ t}is given by the recursive relationship

*K*is a normalizing constant. The first term is given by Equation A16. The second term is given by

*λ*

_{ t}is more than 3 standard deviations away from 0 or 1, Equation A22 simplifies to

*N*(0,

*σ*

_{Δ λ}) is a mean-zero Gaussian distribution with standard deviation,

*σ*

_{Δ λ}. Equations A20 and A21 give the recursive update equations for

*p*(

*λ*

_{ t}∣

*X*

_{ t}). The only free parameter in the continuous adaptation model is

*σ*

_{Δ λ}, the standard deviation of the assumed random walk on

*λ*

_{ t}.

*p*(

*λ*

_{0}∣

*X*

_{0}) (the prior on the mixture proportion before any stimuli are viewed) to be a truncated normal with a mean of 0.89 and a standard deviation of 0.05. Because both models adapted so quickly to the actual stimulus statistics, model performance was essentially independent of these values. Figure A1 shows a representative example of the two models' estimates of

*λ*

_{ t}for random sequences of stimuli in the second session of Experiment 5 (where random sequences of circles and random ellipses are intermixed). The dynamics are subtly different, but as shown in Figure 10 of the main text, both models show the same behavior when expressed as cue weights on the test stimuli in the experiment. The figures also show the asymmetry alluded to in the text. Both models quickly adjust their internal estimates of

*λ*

_{ t}when presented with ellipses with random aspect ratios but adjust more slowly when presented with circles. This results from an asymmetry in the evidence provided by the two stimuli. Images of ellipses with aspect ratios very different from one are only consistent with the random shape category and therefore push the model to change its estimate of

*λ*

_{ t}more quickly than images of circles, which are consistent with both figure categories.

*b*and

*c*coefficients from the regression model in Equation 1, representing multiplicative and additive biases, respectively. This allows us to use the regression analysis for the test stimuli (most of which contained cue conflicts) to estimate the biases associated with subjects' probe slant settings. To make the additive bias term,

*c,*more intuitive, we transformed it into a constant bias relative to the test slant of 35°. Table B1 shows the additive and multiplicative biases measured in the first two sessions of Experiment 1 (prior to changing the context stimuli) for both types of figures used in the experiment. There was no significant difference in the multiplicative constants between the two figures (

*t*(29) = 0.32,

*p*> 0.75). The difference in additive biases was small but significant (

*t*(29) = 6.19,

*p*< 0.001). The biases represent an approximately 3° underestimate of slant at 30° and approximately 2° overestimate of slant at 40°. These biases could reflect biases in the perceived orientation of the probe or in the perceived orientation of the surface or both.

Stimulus type | Multiplicative gain − b | Additive bias − c |
---|---|---|

Ellipses | 1.551 ± 0.087 | 0.063° ± 1.539 |

Diamonds | 1.532 ± 0.098 | −1.832° ± 1.461 |

*t*(29) = 1.08,

*p*> 0.25) or additive biases (

*t*(29) = 0.23,

*p*> 0.8) across the two types of figures, nor were the changes in either bias significantly different between figures whose aspect ratios were randomized in training sessions and figures that were regular throughout (change in multiplicative biases—

*t*(29) = 0.20,

*p*> 0.8; change in additive biases—

*t*(29) = 1.29,

*p*> 0.2).

Shape category | Pre/post-learning
change in the
multiplicative
gain − b | Pre/post-learning
change in the
additive
bias − c |
---|---|---|

Regular context during learning | −0.084 ± 0.077 | 0.078° ± 1.278 |

Random context during learning | −0.074 ± 0.076 | 0.510° ± 1.270 |