We used a temporal difference algorithm of reinforcement learning (
Sutton & Barto, 1981) for updating target maps, which incrementally strengthened the association between each image class and the target location. At each trial, the global feature vector of an input image was compared with that of image classes stored in memory. The similarity value (
si) was determined by the distance between the current global feature (
C), and the global feature of the
ith memorized image class (
Gi) as follows:
\begin{eqnarray}
{s_i} = {\rm{\ }}\frac{{\mathop \sum \nolimits_{j = 1}^n {C_j}G_j^i}}{{\sqrt {\mathop \sum \nolimits_{j = 1}^n {{\left( {{C_j}} \right)}^2}} \sqrt {\mathop \sum \nolimits_{j = 1}^n {{\left( {G_j^i} \right)}^2}} }},\quad
\end{eqnarray}
where
j is the index of an element of a vector and
n is the size of the global feature vector (i.e., 384 in the present model). The similarity value (
si) is used as the weight for updating the
ith target map. When all similarity values were below a preset threshold (see later), then a new image class was generated with a uniform probability map. Otherwise, all target maps with a similarity higher than the threshold were updated with the following formula:
\begin{eqnarray}
{\boldsymbol{M}}_i^{\prime} = {\rm{\ }}{{\boldsymbol{M}}_i} + \eta {s_i}\left( {{{\boldsymbol{T}}_i} - {{\boldsymbol{M}}_i}} \right),\quad
\end{eqnarray}
where
Mi and
Mi′ represent the target map of the
ith image class before and after updating. Constant η is a learning rate that controls the learning speed.
Ti is the target location matrix of the given trial— that is, one at the target location and otherwise zero. The target map was updated based on the error between the current map and the answer. The learning based on error between a current output and answer has been used in modeling behavior, such as classical conditioning (
Rescorla & Wagner, 1972), as well as contextual cueing (
Brady & Chun, 2007). Functional magnetic resonance imaging (fMRI) studies on humans have reported brain activities related to error computation (e.g.,
O'Doherty, Dayan, Friston, Critchley, & Dolan, 2003); therefore, we used this formula for updating the target map. The target map was normalized after updating so that the summation of the map took the value of 1. After repeated exposure to the same images, the target map gradually approached having a peak at the target location, with which the target can be localized soon after identification of the image class for a given image. The target map is updated through repetition, but global feature is not; that is, what the model learns is the association between global feature and target position. If the η value that controls the learning rate is equal to 1, then the peak can be achieved after a single trial. The value of η was determined for each experiment by minimizing the difference between the model performance and the experimental results.