We modeled the effect of the history features on the processing of the reference stimulus. We included the cumulative mean of all stimuli (
\(\bar{s}_i\)), and average stimulus values from eight preceding trials (
ti − j,
j = 1…8), as well as the preceding response in the model (
ri − 1). We first ignore the nonlinearity related to the effect of preceding trials. The observer combines information from the cumulative mean and preceding trials with the reference stimulus, and compares this against the comparison stimulus. The decision variable (or latent variable) on trial
i is
\begin{eqnarray*}
y^*_i &\;=& \alpha _{0} + \alpha _{1} s_{i,\text{cmp}} + \alpha _{2} s_{i,\text{ref}} + \alpha _{3} \bar{s}_i + \alpha _{4} t_{i-1}\nonumber\\
&&+ \ldots {} + \alpha _{11} t_{i-8} + \alpha _{12} r_{i-1} + \varepsilon,
\end{eqnarray*}
where ε represents zero-mean Gaussian noise. Because the observer judges the comparison against the combination of the reference and the history features, this gives a constraint
\(\alpha _1=-(\alpha _2+\alpha _3+\ldots {}+\alpha _{11}) \Rightarrow \sum _{i=1}^{11}\alpha _i=0\), so that the constant term α
0 captures any bias not related to previous stimuli or responses and there is one fewer coefficient to fit. We thus use stimulus differences to the reference as regressors, and the model is
\begin{eqnarray*}\!\!\!\!\!\!\!\!\!\!\!\!\!
y^*_i = \beta _{0} + \beta _{1} \Delta {}s_{i} + \beta _{2} \Delta {}\bar{s}_i + \sum _{j=1}^{8}\beta _{i+2}\Delta {}t_{i-j} + \beta _{11} r_{i-1} + \varepsilon,
\end{eqnarray*}
where Δ
si =
si, cmp −
si, ref,
\(\Delta {}\bar{s}_i=s_{i,\text{ref}}-\bar{s}_i\), Δ
ti − j =
si, ref −
ti − j. We can now rescale these coefficients to stimulus units by dividing by the coefficient for the comparison, β
1. This is the same scaling as in the example of simple PF fitting above. This also means that the weights given to the reference and the stimulus history features sum to minus one. These are analogous to the weights that are reported for the history features in
Figure 8 with a sign convention so that attractive bias is positive. The only difference in the model we used was the nonlinearity applied to the stimulus differences to previous trials. In accordance with previous literature, we used a derivative-of-Gaussian (DoG, scaled to 0–1) function to weight these differences. The weights for the history features in stimulus units cannot then be computed directly from the regression coefficients, but when the stimulus differences Δ
t are small, we can use a linear approximation to the DoG function. That is, we can scale the coefficients by the absolute value of the second DoG evaluated at zero. For example, the weight for the cumulative mean
\(\bar{s}\) is
\begin{eqnarray*}
w_{\bar{s}} = \frac{e^{0.5}}{\sigma _{\mathrm{DoG}}}\frac{\beta _2}{\beta _1},
\end{eqnarray*}
where σ
DoG is the width parameter. The coefficients for preceding trials are scaled similarly. These weights are reported in
Figure 8.