**Searching for objects among clutter is a key ability of the visual system. Speed and accuracy are the crucial performance criteria. How can the brain trade off these competing quantities for optimal performance in different tasks? Can a network of spiking neurons carry out such computations, and what is its architecture? We propose a new model that takes input from V1-type orientation-selective spiking neurons and detects a target in the shortest time that is compatible with a given acceptable error rate. Subject to the assumption that the output of the primary visual cortex comprises Poisson neurons with known properties, our model is an ideal observer. The model has only five free parameters: the signal-to-noise ratio in a hypercolumn, the costs of false-alarm and false-reject errors versus the cost of time, and two parameters accounting for nonperceptual delays. Our model postulates two gain-control mechanisms—one local to hypercolumns and one global to the visual field—to handle variable scene complexity. Error rate and response time predictions match psychophysics data as we vary stimulus discriminability, scene complexity, and the uncertainty associated with each of these quantities. A five-layer spiking network closely approximates the optimal model, suggesting that known cortical mechanisms are sufficient for implementing visual search efficiently.**

*C*= 1) and target absent (

*C*= 0). When the target is present, its location is not known in advance; it may be one of

*L*locations in the image. The observer reports whether the target appears but not where.

*M*locations (

*M*≤

*L*) in each image, where

*M*reflects the complexity of the image and is known as the

*set size*(see Figure 1a). The objects are simplified to be oriented bars, and the only feature in which the target and distractor differ is orientation. Target distinctiveness is controlled by the difference in orientation between target and distractors, referred to as the

*orientation difference*(Δ

*θ*). Prior to image presentation, the set of possible orientations for the target and the distractor is known, whereas the set size and orientation difference may be unknown and may change from one image to the next (see the Psychophysics section for details).

**Figure 1**

**Figure 1**

*n*of events (i.e., action potentials) that will be observed during 1 s is distributed as

*P*(

*n*|

*λ*) =

*λ*

^{n}e^{–}

*/*

^{λ}*n*!, where

*λ*is the expected number of events per second (e.g., the firing rate of the neuron).

*t*] by a population of

*N*neurons, also known as a

*hypercolumn*, from each of the

*L*display locations. Each neuron has a localized spatial receptive field and is tuned to local image properties (Hubel & Wiesel, 1962), which in our case is the local stimulus orientation; the preferred orientations of neurons within a hypercolumn are distributed uniformly in [0°, 180°). , the expected firing rate of the

*i*th neuron, is a function of the neuron's preferred orientation

*θ*and the stimulus orientation

_{i}*θ*∈ [0°, 180°): (in spikes per second, or Hz), where

*λ*

_{min}and

*λ*

_{max}are a neuron's minimum and maximum firing rates, respectively, and

*ψ*∈ (0°, 180°) is the half tuning width. Figure 4c shows the tuning functions of a hypercolumn of eight neurons, and Figure 4f shows the sample spike trains from two locations with different local stimulus orientations.

*C*= 1 |

*C*= 0] is the probability of the observer making a target-present decision when the target is absent, or the false-positive rate; and likewise, 𝔼[Declare

*C*= 0 |

*C*= 1] is the false-negative rate. 𝒞

*and 𝒞*

_{p}*are two free parameters: the cost (in seconds) of false-positive errors and the cost of false-negative errors, respectively. For example, 𝒞*

_{n}*might be quantified in terms of the time wasted exploring an unproductive location while foraging for food, and 𝒞*

_{p}*may be the time it takes to move to the next promising location. The relative cost of errors and time is determined by the circumstances in which the animal operates. For example, an animal searching for scarce food while competing with conspecifics will face a high cost of time (e.g., any delay in pecking a seed will mean that the seed is lost) and low cost of error (e.g., pecking on a pebble rather than a seed just means that the pebble can be spat out). Conversely, an airport luggage inspector faces high false-reject error costs and comparatively lower time costs. 𝒞*

_{n}*and 𝒞*

_{p}*determine how often the observer is willing to make one type of error versus the other and versus waiting for more evidence. Thus, the Bayes risk measures the combined RT and ER costs of a given search mechanism. Given a set of inputs, the optimal strategy is the mechanism minimizing such cost (Figure 2a).*

_{n}**Figure 2**

**Figure 2**

*S*(

*X*), the log ratio of target-present (

_{t}*C*= 1) versus target-absent (

*C*= 0) probability given the observations

*X*. A target-present decision is made as soon as

_{t}*S*(

*X*) crosses an upper threshold

_{t}*τ*

_{1}, while a target-absent decision is made as soon as

*S*(

*X*) crosses a lower threshold

_{t}*τ*

_{0}. Until either event takes place, the observer waits for further information. For convenience we use base 10 for all our logarithms and exponentials; that is, .

*τ*

_{1}> 0 and

*τ*

_{0}< 0 control the maximum tolerable ERs. For example, if

*τ*

_{1}= 2 (i.e., a target-present decision is taken when the stimulus is >10

^{2}times more likely to be a target than a distractor), then the maximum false-positive rate is 1%; Similarly If

*τ*

_{0}= −3, then target likelihood is <10

^{−3}times the distractor's and the false-negative rate is at most 0.1%.

*τ*

_{1}and

*τ*

_{0}are judiciously chosen by the observer to minimize the Bayes risk in Equation 2 and hence are functions of the costs of errors. For example, if 𝒞

*> 𝒞*

_{p}*, the observer should be less reluctant to make a false-negative error and thus should set |*

_{n}*τ*

_{0}| <

*τ*

_{1}. In addition, if both 𝒞

*and 𝒞*

_{p}*are large, the observer should increase |*

_{n}*τ*

_{0}| and

*τ*

_{1}so that fewer errors are made in general at the price of a longer RT. Given this relationship, we parameterize the SPRT with the thresholds

*τ*

_{0}and

*τ*

_{1}instead of the costs of errors 𝒞

*and 𝒞*

_{p}*.*

_{n}*S*(

*X*) between target present and target absent with a pair of thresholds

_{t}*τ*

_{0}and

*τ*

_{1}. Next we explain how

*S*(

*X*) is computed.

_{t}*S*(

*X*) can be systematically constructed from the visual input according to the graphical model in Figure 1b and can account for a wide variety of visual search tasks. We derive a general model that is capable of handling unknown set sizes and orientation differences in the independent and identically distributed (i.i.d.)-distractor heterogeneous search and Heterogeneous search sections. (Readers interested only in this general model are encouraged to skip to those sections.) To build up the concept, we start by reviewing models for simpler tasks including visual discrimination and visual search with known set sizes and orientation differences, both of which have already been explored in the literature (Wald, 1945; Chen, Navalpakkam, & Perona, 2011; Ma et al., 2011). We review that for simple discrimination,

_{t}*S*(

*X*) is a simple diffusion, and the optimal strategy is a diffuse-to-bound system (Ratcliff, 1985). We also show that in all other scenarios,

_{t}*S*(

*X*) is a nonlinear function of the input, and thus diffusions are not optimal. All tasks considered are summarized in Table 1.

_{t}**Table 1**

*l*during the time interval [0,

*t*] in response to a stimulus presented at time 0. is the ensemble responses of all neurons from all locations. We have assumed that such responses are action potentials (Chen et al., 2011), but the same analysis also applies to analog response signals (e.g., a Gaussian random walk for each neuron; Verghese, 2001). Let denote the log likelihood of the spike train data when the object orientation

*Y*at location

^{l}*l*is

*θ*°. Each may be computed by a diffusion (Ratcliff, 1985) in which every new observation induces an additive update in (for details, see the Spiking network implementation section; Equation 12).

*L*=

*M*= 1) and the target and distractor have distinct and unique orientations

*θ*and

_{T}*θ*, respectively. The visual system needs to determine whether the target or the distractor is present in the test image. The log likelihood ratio in this case is well known (Wald, 1945; rederived in Equation 19): which, as first pointed out by Stone (1960), may be computed by a diffuse-to-bound mechanism (Ratcliff, 1985). In addition, as shown by Wald (1945), SPRT is optimal in minimizing the Bayes risk in Equation 2.

_{D}*to denote the set of orientations that the target may take, and use Θ*

_{T}*to denote the set of orientations for the distractor. Further, use*

_{D}*n*and

_{T}*n*to denote the number of orientations in set Θ

_{D}*and Θ*

_{T}*, respectively. We call heterogeneous visual discrimination the case where*

_{D}*n*> 1 and/or

_{T}*n*> 1. The log likelihood ratio is (Ma et al., 2011; rederived in Equation 20) where 𝒮max(·) is the softmax function. For a vector

_{D}**v**and a set of indices ,

*A*

_{1},

*A*

_{2}), we have

*P*(

*A*

_{1}∪

*A*

_{2}) =

*P*(

*A*

_{1}) +

*P*(

*A*

_{2}), then . Since the different target orientations are mutually exclusive, their log likelihoods should be combined using the softmax function to compute the log likelihood for the target. The same argument applies to the distractor.

*L*display locations are occupied by either a target or a distractor (i.e.,

*L*= M > 1) and the display contains either one target or none. The target orientation

*θ*and the distractor orientation

_{T}*θ*are again unique and known; that is,

_{D}*n*=

_{T}*n*= 1. The log likelihood ratio of target present versus target absent is given by Chen et al. (2011; rederived in Equation 21): where is the log likelihood ratio for homogenous discrimination at location

_{D}*l*(see Equation 4).

*S*(

*X*) combines the local log likelihood ratio from all locations using a softmax because the target can appear at only one of

_{t}*L*disjoint locations.

*M*=

*L*> 1) but the orientation difference is not (

*n*> 1 and/or

_{T}*n*> 1). In addition, we assume that target and distractor orientations are sampled i.i.d. in space according to some distribution. We refer to this as the

_{D}*i.i.d.-distractor heterogeneous search*.

*Y*at any nontarget location

^{l}*l*; that is,

*P*(

*Y*|

_{l}*C*= 0). We denote CDD with

^{l}*ϕ*, where . Thus,

*ϕ*is a

*n*-dimensional probability vector; that is, each element of

_{D}*ϕ*is nonnegative and all elements sum to 1. We introduce CDD here because it is a key element in the general model of visual search, as becomes clear later. In contrast, the conditional target distribution

*P*(

*Y*=

_{l}*θ*|

*C*= 1) is not as vital and is assumed to be uniform for notation clarity (see Equation 26 for cases with general target distributions and different CDDs over locations).

^{l}*l*, is obtained as the difference between the log likelihood of the target and that of the distractor (Equation 9), which is reminiscent of Equation 5. Computing the target log likelihood requires marginalizing over the unknown target orientation with a softmax (again assuming uniform prior over possible target orientations in Θ

*). Similarly, the distractor log likelihood marginalizes over the distractor orientation according to the CDD.*

_{T}*M*,

*θ*, and

_{T}*θ*are stochastic (

_{D}*n*and/or

_{T}*n*> 1). This scenario may be handled using mechanisms for i.i.d. distractor heterogeneous search above as building blocks. For example, for a fixed set size, each nontarget location has a certain probability of being blank (as opposed to containing a distractor), which is captured by the CDD. When set size changes, CDD will change correspondingly. Therefore, knowing the CDD effectively allows us to infer the set size and vice versa. Our strategy is to infer the CDD along with the class variables using Bayesian inference.

_{D}*P*(

*ϕ*) be the prior distribution over the CDDs

*ϕ*. Note that, technically,

*P*(

*ϕ*) is a distribution over distributions. Computing the log likelihood ratio requires marginalizing out

*ϕ*according to

*P*(

*ϕ*) and the observation

*X*. We assume that the observer has been exposed to this task for some time and has estimated

_{t}*P*(

*ϕ*). We also assume that the target distribution is independent of the CDD (and relax this assumption in Equation 29). The log likelihood ratio is (see derivations in Materials and method) where where is the log posterior of the CDDs given the observations

*X*(see below). The only difference between Equations 10 and 11 and those describing the i.i.d.-distractor heterogeneous search (Equations 8 and 9) is the second line of Equation 11, where the CDD is marginalized out with respect to

_{t}*Q*(

_{ϕ}*X*). Since both the CDD

_{t}*ϕ*and the distractor orientation

*Y*must be marginalized, two softmaxes are necessary. The equations do not explain how to compute

^{l}*Q*(

_{ϕ}*X*). It may be estimated simultaneously with the main computation by a scene complexity mechanism that is derived from first principles of Bayesian inference (see Equation 25). This mechanism extends across the visual field and may be interpreted as wide-field gain control (see Figure 4a and Equation 32).

_{t}*ϕ*); see Equation 28]. This approach is suboptimal. Intuitively, if the visual scene switches randomly between being cluttered and sparse, then always treating the scene as if it had medium complexity would be either overly optimistic or overly pessimistic. Crucially, the predictions of this simple model are inconsistent with the behavior of human observers.

**Figure 3**

**Figure 3**

**Figure 4**

**Figure 4**

*S*(

*X*) defined in Equation 10. This strategy may be computed by nested combination of diffusions, as shown in Equations 10 and 11. In the next sections we explore the nature of

_{t}*X*in the visual system and show that a simple network of spiking neurons may implement a close approximation of such a strategy.

_{t}*(*

_{θ}*X*), the local log likelihood of the stimulus taking on orientation

_{t}*θ*, from spiking inputs

*X*from V1. ℒ

_{t}*(*

_{θ}*X*) is the building block of

_{t}*S*(

*X*) (Equation 4). Consider one spatial location, corresponding to a hypercolumn containing

_{t}*N*neurons. Let

*K*be the number of action potentials that were produced by all the neurons at that location up to time

_{t}*t*, denote the firing rate of neuron

*i*when the stimulus orientation is

*Y*=

*θ*(Figure 4d, f),

*t*be the time at which the

_{s}*s*th action potential takes place, and

*i*(

*s*) be the index of the neuron that produced it. Then the log likelihood of a set of action potentials is (Jazayeri & Movshon, 2006; Chen et al., 2011; see Equation 18 for detailed derivations)

*. This term can be implemented by integrate-and-fire (Dayan & Abbott, 2003) neurons—one for each relevant orientation*

_{θ}*θ*∈ Θ

*∪ Θ*

_{T}*—that receive afferent connections from all hypercolumn neurons with connection weights (Figure 4d). The second term is computationally irrelevant because it does not depend on the stimulus orientation*

_{D}*θ*and it cancels with similar terms in Equation 11; it may be removed by a gain-control mechanism to prevent the dynamic range of membrane potential from exceeding its physiological limits (see Equation 32; Carandini, Heeger, & Movshon, 1999). Specifically, one may subtract from each ℒ

*a common quantity—for example, the average value of the all the ℒ*

_{θ}*s—without changing in Equation 11.*

_{θ}*must be transmitted downstream for further processing. However, ℒ*

_{θ}*is a continuous quantity, whereas the majority of neurons in the central nervous system are believed to communicate via action potentials. We explored whether this communication may be implemented using action potentials (Gray & McCormick, 1996) emitted from an integrate-and-fire neuron. Consider a sender neuron communicating its membrane potential to a receiver neuron. The sender may emit an action potential whenever its membrane potential surpasses a threshold*

_{θ}*τ*. After firing, the membrane potential drops to its resting value and the sender enters a brief refractory period, the duration of which (about 1 ms) is assumed to be negligible. If the synaptic strength between the two neurons is also

_{s}*τ*, the receiver may decode the signal by simply integrating such weighted action potentials over time. This coding scheme loses some information due to discretization. Varying the discretization threshold

_{s}*τ*trades off the quality of transmission with the number of action potentials; a lower threshold will limit the information loss at the cost of producing more action potentials. Surprisingly, we find that the performance of the spiking network is very close to that of the SPRT, even when

_{s}*τ*is set high, so that a small number of action potentials is produced (see Materials and method for the encoding, Figure 4e through h, Figure 9, and Supplementary Figure S1b through d for the quality of approximation). Since the network behavior is quite insensitive to

_{s}*τ*(see Supplementary Figure S1), we do not consider

_{s}*τ*to be a free parameter and set its value to

_{s}*τ*= 0.5 in our experiments.

_{s}*N*= 16 neurons per visual location; Vinje & Gallant, 2000).

*S*(

*X*) is compared with a pair of thresholds to reach a decision (Equation 3). The positive and negative parts of

_{t}*S*(

*X*), (

_{t}*S*(

*X*))

_{t}^{+}and (–

*S*(

*X*))

_{t}^{+}, may be represented separately by two mutually inhibiting neurons (Gabbiani & Koch, 1996), where (·)

^{+}denotes halfwave rectification: . We can implement Equation 3 by simply setting the firing thresholds of these neurons to the decision threshold

*τ*

_{1}and –

*τ*

_{0}, respectively.

*S*(

*X*) may be computed by a mechanism akin to the ramping neural activity observed in decision-implicated areas such as the frontal eye field (Woodman et al., 2008; Heitz & Schall, 2012; Purcell et al., 2012). (

_{t}*S*(

*X*))

_{t}^{+}and (–

*S*(

*X*))

_{t}^{+}could be converted to two trains of action potentials using the same encoding scheme described in the Signal transduction section. The resultant spike trains may be the input signal of an accumulator model (e.g., Bogacz et al., 2006). The model has been shown to be implementable as a biophysically realistic recurrent network (Wang, 2002; Lo & Wang, 2006; Wong et al., 2007) and capable of producing and thresholding ramping neural activity to trigger motor responses (Mazurek et al., 2003; Woodman et al., 2008; Heitz & Schall, 2012; Purcell et al., 2012; Cassey, Heathcote, & Brown, 2014). While both neural implementations of

*S*(

*X*) are viable options, in the simulations used in this study we opted for the first.

_{t}*Q*= log

_{ϕ}*P*(

*ϕ*|

*X*) is estimated (see Equation 25). At each time instant this estimate is fed back to the local networks to suppress other CDD estimates and equivalently compute using the best estimate for the set size and orientation difference (Equation 11).

_{t}*P*(

*ϕ*)]. The network has only 3

*df*rather than a large number of network parameters (Ma et al., 2011; Krizhevsky, Sutskever, & Hinton, 2012). We discuss this in more detail later.

*df*. Next we investigate whether predictions of our model are consistent with the known literature and assess the optimality of humans in conducting visual search.

**Figure 5**

**Figure 5**

*M*, with a large slope for hard tasks (small orientation difference between target and distractor) and almost no slope for easy tasks (large orientation difference; Figure 5a). Last, the median RT is longer for target absent than for target present, with roughly twice the slope (Figure 5a). The three predictions are in agreement with classical observations in human subjects (Treisman & Gelade, 1980; E. Palmer, Horowitz, Torralba, & Wolfe, 2011).

*θ*= 0°, 12°, 25°, and 37°; see Figure 5c) and explored two conditions: one in which the orientation of the targets is larger than that of the distractors (0°, 12°,

**25°**, and

**37°**; targets in bold) and one in which the orientations are interleaved (0°,

**12°**, 25°, and

**37**°). Our model predicts that the difficulty of the task (median RT at given ER) is much higher in the second condition, even if the minimum orientation difference between the target and the distractor set is the same. This observation matches observations in the psychophysics literature (Duncan & Humphreys, 1989; Hodsoll & Humphreys, 2001).

**Figure 6**

**Figure 6**

*target prevalence*) is varied systematically (Wolfe & Van Wert, 2010). The model's prediction (Figure 7b) matches qualitatively human psychophysics (Wolfe & Van Wert, 2010; reproduced in Figure 7a). Changing target frequency results in a more pronounced change in target-absent RTs than in target-present RTs. False-negative rate is negatively correlated with target prevalence, whereas false-positive rate is the opposite.

**Figure 7**

**Figure 7**

*M*=

*L*= 1. When the orientation difference is 90°, most neurons in the hypercolumn can easily discriminate the target from the distractor. As a result, most action potentials will cause big jumps in the diffusion (Equation 12), and a decision may be made after observing very few action potentials. For example, after one or two spikes, the log likelihood will most likely be either above 0.5 or below −0.5. Therefore, any threshold in (0, 0.5] would achieve the same effect as the threshold

*τ*= 0.5, which corresponds to an ER of 24% (we assume –

*τ*

^{0}=

*τ*

^{1}=

*τ*). Indeed, our model predicts that the ER will be either 50% or less than 24% (Figure 8b). Furthermore, the model predicts that ERs around 20% to 25% would be more frequently observed than ERs around 10% to 15%. This quantization effect continues and gradually dissipates as the threshold is increased because more spikes are needed for the log likelihood to cross the threshold. We do not find in the literature any study describing this phenomenon. We can only assume that such an observation would not be considered worth reporting in the absence of a suitable theoretical framework.

**Figure 8**

**Figure 8**

**Figure 9**

**Figure 9**

*N*= 16 (see Discussion for the plausibility of

*N*), their minimum firing rate constant at

*λ*

_{min}= 1 Hz, and the half-width of their orientation tuning curves at 22° (full width at half height = 52°; Graf et al., 2011). Hence, we were left with only three free parameters: The maximum firing rate of any orientation-selective neuron

*λ*

_{max}controls the signal-to-noise ratio of the hypercolumn, and the upper and lower decision thresholds

*τ*

_{0}and

*τ*

_{1}control the frequency of false-alarm and false-reject errors. Once these parameters are given, all the other parameters of our model are analytically derived.

*μ*and log-time variance . Therefore, we fit RT distributions and ERs with five parameters (three for SPRT and two for nonperceptual delay).

_{D}*and 𝒞*

_{p}*might also vary for each block, we could not assume that the subject's thresholds would remain constant. Therefore, in blocked design experiments, 21 parameters (2 thresholds × 9 conditions + 1 SNR + 2 motor parameters) were used to fit nine conditions, each containing 180 target-present trials and 180 target-absent trials. In mixed-design Experiments 2 and 3, all five parameters were fit jointly across all conditions for each subject because all conditions are mixed. Thus, five parameters (which was reduced to two in the generalization experiment below) were used to fit three conditions, each containing 220 target-present trials and 220 target-absent trials (see Equation 16 for the fitting procedure; Figure 10 for data and fits of a randomly selected individual; Supplementary Figures S3, S4, and S5 for data and fits for every subject; Figure 9 for the ER vs. RT tradeoff curve fit to five subjects with similar signal-to-noise ratio; and Figure 11a and b for all subjects in the blocked experiment).*

_{n}**Figure 10**

**Figure 10**

**Figure 11**

**Figure 11**

*λ*

_{max}and the decision thresholds

*τ*

_{0},

*τ*

_{1}) are plausible (Vinje & Gallant, 2000; see Discussion for the plausibility of

*λ*

_{max}). Subjects had similar parameters, although intersubject variability is noticeable (see Supplementary Figures S3, S4, and S5). Each subject displays different ERs for different conditions (see Figure 9); thus, the decision thresholds are indeed not constant (see Supplementary Figures S3, S4, and S5 for fitted thresholds). It may be possible to model the intercondition variability of the thresholds as the result of the subjects minimizing a global risk function (Drugowitsch et al., 2012). Therefore, for each subject in blocked-design Experiment 1, we have tried fitting a common Bayes risk function (Equation 2), parameterized by the two costs of errors 𝒞

*and 𝒞*

_{p}*, across all blocks and solving for the optimal thresholds for each block independently. This assumption reduces the number of free parameters from 21 to five (2 costs of errors + 1 SNR + 2 motor parameters), and it leads to marked reduction in the quality of fits for some of the subjects (see Supplementary Figure S6). Therefore, as far as our model is concerned, there was some block-to-block variability of the error costs.*

_{n}*λ*

_{max}) and the two nondecision delay parameters estimated from the blocked experiment (Experiment 1) to predict the mixed experiments (Experiments 2 and 3). Thus, for each mixed experiment only two parameters—namely the decision thresholds

*τ*

_{0}and

*τ*

_{1}—were fit. Despite the parsimony in parameterization, the model shows good cross-experiment fits (see Figure 11c through f), suggesting that the parameters of the model refer to real characteristics of the subject.

*L*, whereas a five-layer or deeper network requires only a linear number. For

*L*= 24 locations and

*n*=

_{T}*n*= 3, the three-layer network requires at least 7 × 10

_{D}^{12}neurons, whereas the five-layer network needs less than 1,000. We do not add a sixth layer to marginalize out the CDD but instead use a gain-control circuit in parallel to the five-layer network (see Figure 4a and Equation 25). This design is such that the parallel circuit may be easily shared in other tasks where scene complexity and/or orientation difference estimation is necessary. As a result, although alternative architectures are consistent with both our analysis and the data, the five-layer architecture appears to be the better choice.

*N*= 16 uncorrelated, orientation-tuning neurons per visual location, each with a half tuning width of 22° and a maximum firing rate (estimated from the subjects) of approximately 17 Hz. The tuning width agrees with V1 physiology in primates (Graf et al., 2011). Although our model appears to have underestimated the maximum firing rate of cortical neurons (which ranges from 30 to 70 Hz; Graf et al., 2011) and the population size

*N*(which may be in the order of hundreds), actual V1 neurons are correlated; hence, the equivalent number of independent neurons is smaller than the measured number. For example, take a population of

*N*= 16 independent Poisson neurons, all with a maximum firing rate of 17 Hz, and combine every three of them into a new neuron. This will generate a population of 560 correlated neurons with a maximum firing rate of 51 Hz and a correlation coefficient of 0.19, which is close to the experimentally measured average of 0.17 (Graf et al., 2011; see Vinje & Gallant, 2000, for a detailed discussion on the effect of sparseness and correlation between neurons). Therefore, our estimates of the model parameters are consistent with primate cortical parameters. The parameters of different subjects are close but not identical, matching the known variability within the human population (Van Essen, Newsome, & Maunsell, 1984; E. Palmer et al., 2011). Finally, the fact that estimating model parameters from data collected in the blocked experiments allows the model to predict data collected in the mixed experiments does suggest that the model parameters mirror physiological parameters in our subjects.

*no*. The long answer is that our model is an ideal observer if one agrees to use the LNP model to capture the computational limitations that are imposed by specific assumptions on the biophysics, physiology, and anatomy of the cortex—most specifically, the firing rate of cortical neurons, the number of neurons in a hypercolumn, and the tuning width of orientation-specific neurons. We can estimate these parameters by fitting the experimental data, but to address the optimality question one would need to either measure these quantities directly in human subjects or carry out the psychophysics in laboratory primates in which these numbers are known. The second route appears to be more practical.

*x*-

*y*location of each bar was jittered to prevent crowding. The random jittering vector for each bar alternated between pointing inward and outward; its magnitude was chosen uniformly at random from [0, 0.6°] and its orientation was chosen from [–5.6°, 5.6°]. Unless otherwise specified, distractors were oriented at 30° orientation difference (the difference in orientation between target and distractor bars) and set size (the total number of bars in the image) was systematically varied in our experiments. Orientation difference was chosen from {20°, 30°, 45°} and set size was chosen from {3, 6, 12}. Targets were present in 50% of the images in random order.

*N*orientation-tuned neurons whose preferred orientations are distributed uniformly over the [0°, 180°] interval. The tuning curves, given in Equation 1, are parameterized by the minimum and maximum firing rates

*λ*

_{min}and

*λ*

_{max}as well as the half tuning width

*ψ. λ*

_{min}was fixed at one spike per second (Graf et al., 2011). Since increasing

*λ*

_{max}and the number

*N*of neurons achieves the same effect of boosting the neuron's signal-to-noise ratio, we fixed

*N*= 16 and varied

*λ*

_{max}only. The half tuning width of the neurons was set to

*ψ*= 22° for all subjects. Therefore, the only tuning parameter for the front end is

*λ*

_{max}.

*τ*

_{0}and

*τ*

_{1}. As soon as either one of the two threshold is exceeded, the corresponding decision is taken. This is equivalent to thresholding the probabilities

*P*(

*C*= 1|

*X*) and

_{t}*P*(

*C*= 0|

*X*) with thresholds

_{t}*P*

_{1}and

*P*

_{0}since the probabilities and the likelihood ratio are related by the expression where

*h*(

*x*) is the logistic function, which is monotonically increasing, and where

*P*(

*C*= 0|

*X*) = 1–

_{t}*P*(

*C*= 1|

*X*). Therefore, testing whether

_{t}*S*(

*X*) >

_{t}*τ*

_{1}is equivalent to testing whether

*P*(

*C*= 1|

*X*) >

_{t}*P*

_{1}=

*h*(

*τ*

_{1}) and similarly testing whether

*S*(

*X*) <

_{t}*τ*

_{0}is equivalent to testing

*P*(

*C*= 0|

*X*) <

_{t}*P*

_{0}. The thresholds

*τ*

_{0}and

*τ*

_{1}are additional free parameters in the model.

*T*, which our model predicts, and additional delays

_{p}*T*due to axonal transmission, muscle activation, and other factors external to the visual search process (Schwarz, 2001; E. Palmer et al., 2011). We model the statistics of the nonperceptual response delay with a log-normal distribution, a more realistic model than the exponential (Schwarz, 2001). The log-normal is parameterized by its mean

_{m}*μ*and variance

_{D}*σ*, which are fitted separately for each subject.

_{D}*λ*

_{max},

*μ*, and

_{D}*σ*are specific to each subject, and the decision thresholds (

_{D}*τ*

_{0}and

*τ*

_{1}) are specific to each subject in each task condition. We fit parameters (

*λ*

_{max},

*μ*,

_{D}*σ*) for each subject and (

_{D}*τ*

_{0}and

*τ*

_{1}) for each block in blocked conditions and for each of the mixed conditions.

*θ*, set size

*M*, and stimulus class

*C*—collectively denoted as the

*experimental condition*(

*θ*

_{cond}). Each observation is a pair consisting of the RT

*t*and decision

_{i}*d*∈ {0, 1}. The data set for each experimental condition is . Given a set of parameters

_{i}*θ*

_{model}= {

*λ*

_{max},

*τ*

_{0},

*τ*

_{1},

*μ*,

_{D}*σ*} and a condition

_{D}*θ*

_{cond}, the Bayesian model computes the perceptual RT distribution and the rate

*α*(

*θ*

_{cond},

*θ*

_{model}). The occurrence of an error trial thus follows a Bernoulli probability with mean

*α*(

*θ*

_{cond},

*θ*

_{model}). Recall that the total RT

*T*is modeled as the sum of two variables—the perception time variable

*T*simulated from the Bayesian observer and the log-normally distributed, nonperceptual motor and propagation delay

_{p}*T*:

_{m}*θ*

_{cond}) is given by where

*B*(·|

*β*) is the Bernoulli distribution with mean

*β*, and 𝕀(event) is 1 when the event is true and 0 otherwise. In order to estimate the parameters

*θ*

_{model}given a set of observations, we sample the space of parameters, compute the likelihood of each set of parameters using Equation 16, and select the parameters with the highest likelihood.

_{θ}

*X*(in this section we are concerned with one location only; therefore, we omit the location superscript

_{t}*l*to simplify notation), which is a set of spike trains from

*N*orientation-tuned neurons (which can be generalized to be sensitive to color, intensity, and so on) collected during the time interval (0,

*t*). Let be the set of spikes from neuron

*i*in the time interval from 0 to

*t*, the number of spikes from neuron

*i*in , and

*K*the total number of spikes. Then the likelihood of when stimulus orientation is

_{t}*θ*is given by a Poisson distribution: where is the firing rate of neuron

*i*when the stimulus orientation is

*θ*.

*X*is given by where is the contribution of each action potential from neuron

_{t}*i*to the log likelihood of orientation

*θ*, and const is a term that does not depend on

*θ*and is therefore irrelevant for the decision. The first term is the diffusion that introduces jumps in ℒ

*(*

_{θ}*X*) whenever a spike occurs. The second term is a drift term that moves ℒ

_{t}*(*

_{θ}*X*) gradually in time. When the tuning curves of the neurons regularly tessellate the circle of orientations, as is the case in our model (Figure 4c), the average firing rate of the hypercolumn under different orientations is approximately the same (Figure 4d) and the drift term may be safely omitted from models.

_{t}*θ*and

_{T}*θ*. Therefore, which proves Equation 4.

_{D}*θ*∈ Θ

_{T}*and*

_{T}*θ*∈ Θ

_{D}*. For simplicity assume uniform prior on both target and distractor orientation; that is,*

_{D}*P*(

*θ*|

*C*= 1) = 1/

*n*, ∀

_{T}*θ*∈ Θ

*, and*

_{T}*P*(

*θ*|

*C*= 0) = 1 /

*n*, ∀

_{D}*θ*∈ Θ

*: which proves Equation 5.*

_{D}*S*(

*X*) for homogeneous visual search (

_{t}*M*=

*L*> 1,

*n*=

_{T}*n*= 1) from the local orientation log likelihoods from each of the

_{D}*L*locations. Call

*l*∈ {1, 2, …,

_{T}*L*} the target location and assume uniform prior on

*l*. Equation 4 is proved below:

_{T}*P*(Y

*|C*

^{l}*= 0) of stimulus orientation at a nontarget location. Below are three examples.*

^{l}*ϕ*= [0.2, 0.5, 0.3].

*Y*= ∅ that a nontarget location is blank, i.e., it does not contain a stimulus bar. If there are

^{l}*M*display items, then the probability of any nontarget location being blank is (

*L*–

*M*)/

*L*. A CDD is a two-dimensional vector of and the three different set sizes may be represented by three CDD s of equal probability: where

*ε*is a small number to prevent zero probability.

*M*, only

*M*locations can contain a distractor. If we place a distractor at each location with probability

*M*/

*L*, we do not always observe

*M*distractors. Instead, the actual set size follows a binomial distribution with mean

*M*. However, this is a reasonable approximation because the human visual system can generalize to unseen set sizes effortlessly. In addition, the values of

*M*used in our experiments are often different enough {3, 6, 12} that the i.i.d. model is equally effective in inferring

*M*(Figure 4i).

*S*(

*X*) from the orientation log likelihoods from all locations

_{t}*l*, which we show below. The target-present likelihood

*P*(

*X*|

_{t}*C*= 1) is given by marginalizing out the target location

*l*∈ {1, 2, …,

_{T}*L*}, CDD

*ϕ*, and the target and distractor orientations. Let

*C*∈ {0,1} denote the stimulus class at location

^{l}*l*;

*C*= 1 if and only if location

^{l}*l*contains a target. In light of the graphical model in Figure 1b, where

*P*(

*ϕ*|

*X*). Define the log posterior of CDD as

_{t}*P*(

*l*=

_{T}*l*|

*C*= 1) and on the target type

*P*(

*Y*=

^{l}*θ*|

*C*= 1): which proves Equations 10 and 11.

^{l}*M*∈ {3, 6, 12}, SPRT estimates the value of

*M*given

*X*for each trial, whereas the simple model assumes a set size of 𝔼(

_{t}*M*) = 7 for all the trials.

*ϕ*encode both the target and distractor orientation distribution:

*ϕ*= {

*ϕ*

^{(}

^{T}^{)},

*ϕ*

^{(}

^{D}^{)}}, where and . The log likelihood of target present in Equation 23 now becomes

*ϕ*

^{(}

^{D}^{)}and

*ϕ*

^{(}

^{T}^{)}vary independently. Then, where is the expected value of

*ϕ*

^{(}

^{T}^{)}. Equation 30 is equivalent to Equation 27 with a different prior ( ) on target orientation.

*p*; for example,

_{T}*ṗ*= 1 – 0.5

_{T}^{1/}

*will produce target-absent scenes with probability 0.5. For the simple case of homogeneous visual search—known set size*

^{M}*M*, unique target orientation Θ

*= {*

_{T}*θ*}, and distractor orientation Θ

_{T}*= {*

_{D}*θ*}—we first derive the likelihood of observations with and without the class label:

_{D}*x*)

_{+}and (

*x*)

_{–}denote the positive and negative parts of

*x*, respectively.

*i*has membrane potential

*U*, is given by where

_{i}*g*(·) is a function computing the gain of the population. Gain control serves two purposes: one to control the range of a neuron's membrane potential within its physiological limits (as needed in Equation 12), and the other to provide normalization to allow a probabilistic interpretation of the population code (as needed in Equation 25). One gain that serves both purposes is the softmax function: , which is what we use for computing the log posterior of CDD

*Q*(

_{ϕ}*X*) (Equation 25). While the popular form of gain control is divisive normalization (Carandini & Heeger, 2011), we use subtractive normalization (Doiron, Longtin, Berman, & Maler, 2001) because it comes out naturally from the SPRT computation involving log likelihoods.

_{t}*U*(

*t*) to a distant neuron, we hypothesize that nature makes use of action potentials (unary encoding in engineering). The sender neuron maintains two thresholds . Whenever

*U*(

*t*) shoots above the positive threshold , a spike is generated and the membrane discharges by an amount equal to the threshold; that is, . If a discharged membrane potential

*U*(

*t*') is still bigger than the threshold, then another spike is generated after a refractory period

*t*:

_{ref}*U*(

*t*'+

*t*) =

_{ref}*U*(

*t*') ; this process is repeated, generating a burst of

*k*spikes, until . The spikes travel to the receiver neuron through an excitatory synapse whose strength is equal to the threshold, thus allowing the receiver to compute

*U*(

*t*) with only minimal delay. Similar mechanisms can be implemented for the case where

*U*(

*t*) drops below (see Supplementary Figure S2 and Figure 4g through i).

*Neuron*, 60 (6), 1142–1152.

*Foundations and Trends in Machine Learning*, 2 (1), 1–127.

*Psychological Review*, 113 (4), 700–765.

*Cognitive Psychology*, 57 (3), 153–178.

*Journal of Mathematical Psychology*, 32 (2), 91–134.

*Psychological Review*, 100 (3), 432–459.

*Spatial Vision*, 17 (4), 295–326.

*Nature Reviews Neuroscience*, 13 (1), 51–62.

*Models of cortical circuits*( pp. 401–443). New York, NY: Springer.

*Journal of Experimental Psychology: Human Perception and Performance*, 24 (2), 673–692.

*PLoS Computational Biology*, 10 (7), e1003700.

*Advances in neural information processing systems*( pp. 2699–2707). Granada, Spain: NIPS Foundation.

*Network: Computation in Neural Systems*, 12 (2), 199–213.

*Mathematics of Control, Signals and Systems*, 2 (4), 303–314.

*Vision Research*, 21 (5), 705–712.

*Journal of Cognitive Neuroscience*, 15 (1), 154–155.

*Annual Review of Neuroscience*, 18 (1), 193–222.

*Annual Review of Psychology*, 31, 309–341.

*Neural Computation*, 13 (1), 227–248.

*Journal of Experimental Psychology: Human Perception and Performance*, 30 (1), 3–27.

*Journal of Experimental Psychology: Human Perception and Performance*, 36 (5), 1128–1144.

*The Journal of Neuroscience*, 32 (11), 3612–3628.

*Psychological Review*, 96 (3), 433–458.

*Proceedings of SPIE*, 4324, 91–102, doi:10.1117/12.431177.

*Cerebral Cortex*, 1 (1), 1–47.

*Neural Computation*, 8 (1), 44–66.

*Psychological Review*, 96 (2), 267–314.

*Vision Research*, 51 (7), 771–781.

*Nature Neuroscience*, 17 (6), 858–865.

*Nature Neuroscience*, 14 (2), 239–245.

*Science*, 274 (5284), 109–113.

*Signal detection theory and psychophysics*. Los Altos, CA: Peninsula.

*Neuron*, 76 (3), 616–628.

*Perception & Psychophysics*, 63 (5), 918–926.

*The Journal of Physiology*, 160 (1), 106–154.

*Nature Neuroscience*, 9 (5), 690–696.

*Matters of intelligence*( pp. 115–141). New York, NY: Springer.

*Advances in neural information processing*( pp. 1106–1114). Lake Tahoe: NIPS Foundation.

*Nature Neuroscience*, 9 (7), 956–963.

*Nature Neuroscience*, 14 (6), 783–790.

*Cerebral Cortex*, 13 (11), 1257–1269.

*Journal of Neurophysiology*, 91 (1), 152–162.

*Vision Research*, 45 (14), 1885–1899.

*Nature*, 434 (7031), 387–391.

*Proceedings of the National Academy of Sciences, USA*, 107 (11), 5232–5237.

*Neural Computation*, 21 (9), 2437–2465.

*Journal of Experimental Psychology: Human Perception and Performance*, 37 (1), 58–71.

*Vision Research*, 34 (13), 1703–1721.

*Vision Research*, 40 (10), 1227–1268.

*Frontiers in Computational Neuroscience*, 8 (42), 1–10.

*Philosophical Transactions of the Royal Society of London B: Biological Sciences*, 298 (1089), 187–198.

*The Journal of Neuroscience*, 32 (10), 3433–3446.

*Psychological Review*, 92 (2), 212–225.

*Fundamentals of statistical and thermal physics*(Vol. 1). New York, NY: McGraw-Hill.

*Journal of Experimental Psychology: Human Perception and Performance*, 27 (4), 985–999.

*Journal of Neurophysiology*, 76 (4), 2790–2793.

*Behavior Research Methods, Instruments, & Computers*, 33 (4), 457–469.

*Neuron*, 62 (1), 17–29.

*Journal of Neurophysiology*, 86 (4), 1916–1936.

*The Cognitive Neurosciences*, 3, 327–338.

*Science*, 202 (4365), 315–318.

*Psychometrika*, 25 (3), 251–260.

*Cognitive Psychology*, 12 (1), 97–136.

*Psychological Review*, 108 (3), 550–592.

*Vision Research*, 24 (5), 429–448.

*Neuron*, 31 (4), 523–535.

*Vision Research*, 34 (18), 2453–2467.

*Science*, 287 (5456), 1273–1276.

*The Annals of Mathematical Statistics*, 16 (2), 117–186.

*The Annals of Mathematical Statistics*, 19 (3), 326–339.

*Neuron*, 36 (5), 955–968.

*Integrated models of cognitive systems*( pp. 99–119). New York: Oxford University Press.

*Nature Reviews Neuroscience*, 5 (6), 495–501.

*Nature*, 435 (7041), 439–440.

*Vision Research*, 50 (14), 1304–1311.

*Current Biology*, 20 (2), 121–124.

*Frontiers in Computational Neuroscience*, 1, 6.

*Psychological Science*, 19 (2), 128–136.