The neural binding model (
Schneegans & Bays, 2017;
Lugtmeijer et al., 2021) assumes that feature conjunctions in the memory sample stimuli are encoded in idealized conjunctive neural population codes, in which the firing rate of each neuron is determined by its preferred feature values and associated tuning curves in two feature dimensions (e.g., color and location;
Figure 2A). The mean firing rate of neuron
\(i\) representing the cue feature value
\(\psi _j\) and report feature value
\(\theta _j\) of item
\(j\) in the sample array is given by
\begin{equation}
\overline{r}_{i,j} (\psi _{j} , \theta _{j} ) = \frac{\gamma }{NM} \phi _{\circ }(\psi _{j}; \psi ^{\prime }_{i}, \kappa _{\psi }) \phi _{\circ }(\theta _{j}; \theta ^{\prime }_{i}, \kappa _{\theta })
\end{equation}
Here,
\(\gamma\) is the mean total firing rate of the neural population, which is normalized over the number of items,
\(N\), and the number of neurons,
\(M\), that contribute to the representation of each item. Neural tuning curves are modeled as von Mises distributions,
\(\phi _\circ\), parameterized with the neuron’s preferred feature values,
\(\psi ^{\prime }_{i}\) and
\(\theta ^{\prime }_{i}\), and concentration parameters,
\(\kappa _{\psi }\) and
\(\kappa _{\theta }\), for the two feature dimensions. Neural spiking activity in the idealized population code is assumed to be generated by independent Poisson processes based on each neuron’s mean firing rate, yielding spike counts
\begin{equation}
r_{i,j} \sim \textrm {Pois}(\overline{r}_{i,j}).
\end{equation}
At recall, the feature values of all sample items are decoded through maximum likelihood estimation based on the spiking activity during a fixed decoding interval. The item whose decoded cue feature value is closest to the presented cue is selected, and its decoded report feature value is produced as a response. The derivation of response probability distributions for the population model as used here is described in detail in
Lugtmeijer et al. (2021).
We considered different model architectures for the binding between the three feature dimensions in the present experiment (location, color, and orientation). In the
spatial binding model (
Figure 2B), we assume that there are separate conjunctive population codes for the association between each nonspatial feature and its location, implementing the conceptual idea of independent feature maps over visual space. The two responses in each trial are then generated independently from these two populations, such that the probability of reporting a color
\(\theta _{\mathrm {col}}\) and orientation
\(\theta _{\mathrm {ori}}\) for a given location cue
\(\theta _{\mathrm {loc}}\) is determined as
\begin{equation}
p(\theta _{\mathrm {col}}, \theta _{\mathrm {ori}} | \theta _{\mathrm {loc}}) = p(\theta _{\mathrm {col}} | \theta _{\mathrm {loc}}) p(\theta _{\mathrm {ori}} | \theta _{\mathrm {loc}}) .
\end{equation}
Two other model variants assume that there is also an explicit population code representation of color-orientation conjunctions. Using this in the present experiment leads to indirect retrieval of one reported features via the other one, but allows direct retrieval of reported features from the cue in Experiment 2. The
binding via color model assumes that the orientation cue is used to retrieve the item’s color based on the location cue, and that color is then used as an intermediary cue to retrieve the associated orientation:
\begin{equation}
p(\theta _{\mathrm {col}}, \theta _{\mathrm {ori}} | \theta _{\mathrm {loc}}) = p(\theta _{\mathrm {col}} | \theta _{\mathrm {loc}}) p(\theta _{\mathrm {ori}} | \theta _{\mathrm {col}})
\end{equation}
The
binding-via-orientation model assumes that the location cue is used to first retrieve the item’s orientation (although it is reported second), and the color is then retrieved based on the associated orientation:
\begin{equation}
p(\theta _{\mathrm {col}}, \theta _{\mathrm {ori}} | \theta _{\mathrm {loc}}) = p(\theta _{\mathrm {ori}} | \theta _{\mathrm {loc}}) p(\theta _{\mathrm {col}} | \theta _{\mathrm {ori}})
\end{equation}
All model variants have four free parameters, namely, the overall spike rate \(\gamma\) and a tuning curve width for each feature dimension. The spike rate and tuning curve widths for matching features are shared across the different conjunctive population codes in each model variant. We obtained separate maximum likelihood fits of the three model variants for each participant and delay condition, and compared the quality of fit based on their log-likelihood values. To compare model fits with behavioral data, we simulated responses from the best fitting model, using the same trials as in the actual experiment repeated 100 times, and then applied the same analyses that were used for the behavioral data.
We also compared the best-fitting neural binding model to the two-component joint mixture model using both Akaike information criterion (AIC) and Bayesian information criterion (BIC) scores (because the models differ in the number of free parameters). The three-component joint mixture models performed worse in these scores for all conditions, and we do not report individual comparisons.