The probabilistic observer adapted its parameter values on each trial. These parameter values always equaled the best possible values, and thus, our probabilistic observers can also be regarded as ideal learners (Eckstein, Abbey, Pham, & Shimozaki,
2004; Michel & Jacobs,
2008). On trial
t, the observer had viewed the images on all trials up to and including trial
t, along with the corresponding correct judgments of these images. The observer should use all of this information to optimally set its parameters. Let
X1,…,t denote the values of the front-end output units on trials 1 through
t (i.e.,
1,…,
t), and let
Y1*
,…,t denote the corresponding values of the binary target variable (i.e.,
y1*,…,
yt*). The optimal setting of the parameter values (in a maximum a posteriori sense) is the values that maximize the posterior distribution of the parameter values given the data
X1,…,t and
Y1*
,…,t. Using Bayes' rule, this distribution can be written as
The second term on the right-hand side is the prior distribution of parameter values. We assume that this is a uniform distribution. The first term is the likelihood function. Assuming that the data on each trial are independent, the likelihood function can be rewritten as
An ideal probabilistic observer maximized this likelihood function as follows. Let
IL be the set of trial indices for trials in which a left-closer surface was depicted, and let
IR be the set of trial indices for trials in which a right-closer surface was depicted.
IL and
IR are disjoint sets whose union is the set {1, …,
t}. The mean vector of an observer's left-closer Gaussian distribution was set to the mean of the data {
i}
i∈IL, whereas the mean vector of the right-closer distribution was set to the mean of the data {
i}
i∈IR.