Despite embodying fundamentally different assumptions about attentional allocation, a wide range of popular models of attention include a max-of-outputs mechanism for selection. Within these models, attention is directed to the items with the most extreme-value along a perceptual dimension via, for example, a winner-take-all mechanism. From the detection theoretic approach, this MAX-observer can be optimal under specific situations, however in distracter heterogeneity manipulations or in natural visual scenes this is not always the case. We derive a Bayesian maximum a posteriori (MAP)-observer, which is optimal in both these situations. While it retains a form of the max-of-outputs mechanism, it is based on the maximum a posterior probability dimension, instead of a perceptual dimension. To test this model we investigated human visual search performance using a yes/no procedure while adding external orientation uncertainty to distracter elements. The results are much better fitted by the predictions of a MAP observer than a MAX observer. We conclude a max-like mechanism may well underlie the allocation of visual attention, but this is based upon a probability dimension, not a perceptual dimension.

*probability*(hence the name MAP-observer) as opposed to the maximum observation along a

*perceptual*dimension. Our results support the notion that we are near Bayesian-optimal observers, making decisions based on a probability axis, rather than a perceptual one.

*v*) such as orientation, our internal estimate (i.e. percept) of these (

*u*) stimuli values has a degree of uncertainty. The uncertainty will take on specific extents, thus our representations of targets (T) and distracters (D) can be described by Gaussian distributions

^{1}

*u*

_{ t}=

*N*(

*μ*

_{ T},

*σ*

_{ T}) and

*u*

_{ d}=

*N*(

*μ*

_{ D},

*σ*

_{ D}) (Green & Swets, 1966). In this way our representations accurately capture both the average percept associated with a stimulus and also its degree of variability.

*s*) or noise trial (

*n*). The internal percepts of the

*N*display items in the target present trial can be described as

*U*

_{ s}= {

*t*

_{1},

*d*

_{1}, …

*d*

_{ N−1}} while the target absent trial as vector

*U*

_{ n}= {

*d*

_{1}, …

*d*

_{ N}}. According to the MAX rule, only the highest valued percept is considered (i.e. max(

*U*

_{ n}) and max(

*U*

_{ s})) and then passed on for the last step of decision making.

*c*. The probability of deciding the target is present in a noise trial (i.e. false alarm rate) is simply the proportion of maximum percept above a criterion on noise trials

*P*(‘yes’∣

*n*) =

*P*(max(

*U*

_{ n}) >

*c*). Similarly the probability of deciding the target is present in a signal trial (i.e. hit rate) is the proportion of maximum percepts above a criterion on signal trials

*P*(‘yes’∣

*s*) =

*P*(max(

*U*

_{ s}) >

*c*), see Palmer et al. (2000). By moving the models' internal criterion

*c*, it is possible to trace out a predicted ROC curve: this can be compared to empirically collected hit and false alarm rates in yes/no visual search experiments.

*P*(

*u*∣

*T*) and

*P*(

*u*∣

*D*), that is the probability of an internal percept given the display item is a target or distracter, respectively. For our purposes, it is also useful to define two separate sources of uncertainty: one external source of real uncertainty in the display and distracter items,

*σ*

_{ T}and

*σ*

_{ D}. The second source is internal noise

*σ*

_{ N}, produced for example by neural noise (e.g. Tolhurst, Movshon, & Dean, 1983). We make the simplifying assumption that internal noise is homogeneous over certain ranges of stimulus values (

*v*). Therefore, resulting target and distracter distributions respectively are

*P*(

*u*∣

*T*) = N(

*μ*

_{T},

*P*(

*u*∣

*D*) = N(

*μ*

_{D},

*MAP*

_{ s}defines a vector of posterior probabilities observed on signal trials

*MAP*

_{ s}= max(

*P*(

*T*∣

*U*

_{ s})) and similarly

*MAP*

_{ n}for noise trials

*MAP*

_{ n}= max(

*P*(

*T*∣

*U*

_{ n})). While this may seem a trivial difference, it is this which determines if feature information is utilized in an optimal way or not. The question then is how to calculate these posterior probabilities

*P*(

*T*∣

*U*). This can be done simply by using Bayes equation

*P*(

*u*) describes a ‘weighted sum’ of target and distracter likelihoods

*P*(

*u*) =

*P*(

*u*∣

*T*) ·

*P*(

*T*) +

*P*(

*u*∣

*D*) ·

*P*(

*D*). This can be thought of as a distribution representing the relative occurrence of features observed.

*P*(

*T*) and

*P*(

*D*) define the relative occurrence of targets and distracters. On target present trials, only 1/

*N*of the display items is the target. However the prior probability of the target appearing is not 1/

*N*because target present trials only occur half of the time. Therefore the prior probability of a display item being a target is

*P*(

*T*) = 1/2

*N*. And correspondingly, the rest of the display items will be distracters

*P*(

*D*) = 1 −

*P*(

*T*). For example, if the set size

*N*= 4 then

*P*(

*T*) = 1/8 and

*P*(

*D*) = 7/8.

*P*(‘yes’∣

*n*) =

*P*(

*MAP*

_{ n}>

*c*) and hit rates by

*P*(‘yes’∣

*s*) =

*P*(

*MAP*

_{ s}>

*c*) and by varying the criterion

*c*we obtain an ROC curve.

*mean*orientation of 0°. In the case of no added distracter orientation uncertainty ( Figures 3a and 3b), this task cannot be performed above chance (i.e.

*d*′ = 0). However, as an increasing amount of distracter uncertainty is applied ( Figures 3c, 3d,3e, and 3f) then it becomes more probable that the distracter orientations are different from target orientations. This prediction of increasing performance with increasing distracter heterogeneity is the opposite trend to that seen in previous reports using the reaction time paradigm (e.g. Duncan & Humphreys, 1989, 1992); however this is specifically due to the fact that in our task the target and distracter orientations have identical mean values. Pilot (unpublished) data does in fact show that if target and distracters have different means, performance does decrease with increasing distracter heterogeneity, in line with previous reports. The advantage of our choice of identical target and distracter means combined with high distracter variability is that distracter orientation can be both higher and lower than the target orientation, so is ideally suited in distinguishing MAX and MAP observer hypotheses.

*μ*

_{ T}= 0°,

*σ*

_{ T}= 0°). Distracter orientation was sampled from a Gaussian distribution with a mean of 0° and standard deviation

*σ*

_{ D}which was experimentally manipulated. This reflects the degree of externally added orientation noise. To clarify, this results in a heterogeneous set of distracters on each individual stimulus presentation. All display items were presented at constant retinal eccentricity of 11.2° from screen center to minimize eccentricity based variation in detection probability. A set size of N = 4 display items was used, items were presented on the diagonals. Use of 4 display items avoids any potential crowding or lateral interaction effects as inter-item distance is high. This is important as it would violate the assumption that each display items in an independent observation. On target absent trials 4 distracters were present, on present trials, 1 target and 3 distracters were present.

*σ*

_{ N}values based on test set performance: BV 5.9°, CS 4.4°, GH 4.9°). From the goodness of fit ( Figure 4, left column), we can conclude that the MAP observer provides a much better account than the MAX observer for all realistic situations (i.e. when internal noise is not entirely absent). But how well do the predictions fit performance as a function of distracter heterogeneity?

*cued*detection task when the signal-to-noise ratio or the target's contrast is low. This brings into question the role of a max-of-output mechanism in visual search. Here we have shown that in the distracter variance case, that the MAP observer does fit the empirical data well: we interpret this (alongside the existing literature) as support for inference processes underlying visual search.

*differences*between percepts may be able to achieve above MAX level performance. See the

*RCref*model (Rosenholtz, 2001) and the

*signed-max observer*(Baldassi & Verghese, 2002).

*If*the brain made the simplifying assumption that targets and background had equal variance, then in the simplified case of a single feature dimension, the MAX observer would provide the optimal way of detecting targets. We showed the MAX observer provides a poor fit to human data, therefore we argue that the brain probably does not make this simplifying assumption about the world. Instead, we know at the least that the brain incorporates knowledge that the distracters or background can be much more variable than targets.

*representation,*the probabilistic population coding interpretation of neural coding makes convincing arguments that neural tuning curves literally represent probability distributions (Pouget, Dayan, & Zemel, 2000). For example the Gaussian representations of target and distracters (i.e. likelihoods; Figures 3a, 3c, and 3e) discussed in this paper can be implemented by neurons or groups of neurons whose neural tuning curves match the distribution of observed orientations. In terms of

*integration,*Gold and Shadlen (2001) and Ma, Beck, Latham, and Pouget (2006) describe how neurons with such tuning curves can be used for probability and likelihood calculations. Whether the brain calculates posterior probability or log posterior odds (see 1) does not affect the fit of theory to data, but further empirical investigation could shed light on neural implementation. In terms of

*decision,*the notion that observers make decisions by a threshold mechanism upon sensory dimensions is a fundamental tenant of signal detection theory (for example Ress & Heeger, 2003).

*modus operandi*of humans. In order to empirically determine this, we compared predicted and human performance in a distracter heterogeneity visual search manipulation. Human performance at visual searches exceeded the maximum performance predicted by the SDT account. We take this as evidence that humans do not utilize cues according to the MAX rule. Instead, performance was better accounted for by the Bayesian optimal observer where humans evaluate the probability that each display item is the target given the feature data available: present or absent response is determined by a threshold upon this posterior probability dimension. The contribution of this paper is not so much the formulation of the optimal observer itself (Palmer et al., 2000; Shimozaki et al., 2003), but the empirical data that supports it as a good model of the processes underlying visual search.

*N*.

*σ*

_{ T}=

*σ*

_{ D}= 1) and

*μ*

_{ T}=

*μ*

_{ D}. The resulting log likelihood (first term in Equation A1) leaving just the log prior odds (second term in Equation A1), which is a constant log[1/(

*N*− 1)] for a given set size. This can be compared to the flat posterior for the same situation show in Figure 3b.

*μ*

_{ T}=

*d*′,

*μ*

_{ D}= 0 and simplify. This results in a log likelihood of

*d*′(

*u*−

*d*′/2) (also see Wickens, 2002, p. 161) which is also a linear function of the internal percept,

*u*. The log prior odds is just a constant for a given set size, so does not affect the linearity. Because the log posterior odds is a linear function of

*u,*then the display item with the highest internal sensory percept

*u*will also have the highest log posterior odds. In summary, when targets and distracters are Gaussians of equal variance, then the MAX observer is equal to the optimal observer.

*σ*

_{ T}≠

*σ*

_{ D}), the same simplification procedure of Equation A1 results in quadratic functions of

*u,*regardless of whether the target and distracter means are equal or not. Because of this, the maximum value along the stimulus dimension

*u*will not necessarily equate to the maximum value along the log posterior odds. Thus MAX and MAP observers result in different predictions in the case

*σ*

_{ T}≠

*σ*

_{ D}.