Free
Article  |   November 2011
Covert visual search: Prior beliefs are optimally combined with sensory evidence
Author Affiliations
Journal of Vision November 2011, Vol.11, 25. doi:10.1167/11.13.25
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Benjamin Vincent; Covert visual search: Prior beliefs are optimally combined with sensory evidence. Journal of Vision 2011;11(13):25. doi: 10.1167/11.13.25.

      Download citation file:


      © 2017 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract

Has evolution optimized visual selective attention to make the best possible use of all information available? If so, then Bayesian optimal performance in a localization task is achieved by optimally weighting the visual evidence with one's prior spatial expectations. In 2 psychophysical experiments, participants conducted covert target localization where both visual cues and prior expectations were available. The amount of information conveyed by the visual evidence was held constant, while the degree of belief was manipulated via peripheral cuing (Experiment 1) and spatial probabilities (Experiment 2). A number of findings result: (1) People appear to optimally combine slightly biased prior beliefs with sensory evidence. (2) These biases are directly comparable to those descriptively accounted for by the Prospect Theory. (3) Probabilistic information about a target's upcoming location is integrated identically, irrespective of whether endogenous or exogenous cuing is used. (4) In localization tasks, spatial attention can be understood and quantitatively modeled as a set of prior expectations over space that modulate incoming noisy sensory evidence.

Introduction
When we attempt to localize a target in space, we rely on our current sensory observations, but we may also be guided by our prior beliefs about where a target may occur. However, do we rely on using either our observations or our prior beliefs depending on which is most reliable, or do we combine the two in order to make the best possible guess of a target's location? In this work, I test the hypothesis that visual selective attention involves making near-optimal inferences about the state of the world. If this is true, then Bayes' Theorem tells us that our visual observations (likelihood) should be combined with our (prior) spatial beliefs about where a target may occur. Human performance in a target localization task is compared to that of an optimal observer that describes the best possible performance. Optimal performance in this task is calculated by using Bayes' Theorem (Bayes, 1763; Bishop, 2006; Doya, Pouget, & Rao, 2007; Glimcher, 2004; Kruschke, 2011) to describe the target localization task (Cameron, Tai, Eckstein, & Carrasco, 2004; Eckstein, Abbey, Pham, & Shimozaki, 2004; Eckstein, Peterson, Pham, & Droll, 2009). 
Before surveying pertinent literature, the experimental approach is briefly outlined. Observers conduct a 4-spatial-alternative forced-choice localization (see Figure 1). Localization decisions are based on uncertain briefly presented visual information, calibrated such that 50% performance can be achieved from the visual observations alone. To test whether localization performance can be improved by combining those observations with one's prior spatial beliefs, two different methods are used to influence those beliefs. 
Figure 1
 
Structure of the experiments. (a) The causal structure of cue and target location in the experiment shows that a cue is equally likely to occur in 1 of 4 locations. A target then has a probability of ε of being at the cued location and a probability of (1 − ε) / 3 of being in any other location. In Experiment 2, within an experimental block, a target appears in a chosen location with probability ε. Trial structures are shown in (b) and (d).
Figure 1
 
Structure of the experiments. (a) The causal structure of cue and target location in the experiment shows that a cue is equally likely to occur in 1 of 4 locations. A target then has a probability of ε of being at the cued location and a probability of (1 − ε) / 3 of being in any other location. In Experiment 2, within an experimental block, a target appears in a chosen location with probability ε. Trial structures are shown in (b) and (d).
In Experiment 1, peripheral precues provide exogenous information about a target's upcoming location. Cue validities are manipulated from 0%, 10%, 25%, 50%, 75%, and 100% providing a corresponding degree of belief of where the target may subsequently appear. The 25% expectation condition is neutral in that it provides no meaningful information about the target's upcoming location. 
In Experiment 2, the same degrees of belief are investigated but are provided by endogenous means. Within an experimental block, observers are informed of the probability of a target appearing in a certain location (see Figure 1). Such a spatial probability manipulation provides a longer term set of spatial expectations. However, in terms of the information available for observers to make a localization, both experiments are functionally equivalent; will the performances be identical? If so, this would be a clear and perhaps novel demonstration that information is integrated identically across both exogenous and endogenous cuing methods. 
To test the hypothesis of optimal combination of prior belief and current sensory evidence, it is necessary to evaluate the best possible performance, not the fastest. To do this, the experimental task must have certain qualities: First, unspeeded responses are elicited, in conjunction with task instructions that emphasize the importance of accuracy. Care must be taken when drawing parallels from the results of “simple reaction time” studies where a speeded response is made upon the onset of an unambiguous stimulus. Second, short stimulus display durations (133 ms) deliver a controlled amount of information to observers and helps to result in the visual evidence being only partially informative of target location. 
Before describing the workings of a Bayesian optimal model as well as a heuristic account of the target localization task, previous work is reviewed to evaluate evidence for optimal use of either visual observations or spatial expectations. 
The role of visual observations
There is strong support for the notion that visual observations are utilized in an optimal manner to allow the best possible detection or localization performance. The signal detection theoretic approach (Green & Swets, 1966) applied to visual search can account for many visual search phenomena with short display durations (see review by Verghese, 2001). Simply put, the approach hypothesizes that people make optimal use of uncertain sensory observations. This notion now has strong support from a wide range of studies manipulating the nature of the visual information available to observers in various detection or localization tasks. For example, the set size effect is due to each additional display item contributing uncertainty to the decision process (Cameron et al., 2004; Eckstein, Thomas, Palmer, & Shimozaki, 2000). Conjunction searches can also be understood in terms of optimal inference based on uncertain visual observations along multiple visual feature dimensions (Eckstein, 1998; Eckstein et al., 2000). Search asymmetry effects can be understood when display items have unequal levels of internal uncertainty associated with them (Dosher, Han, & Lu, 2004; Vincent, 2011). Distracter heterogeneity effects can also be explained (Dosher, Han, & Lu, 2010) and show that decisions of target presence is based on an optimal use of uncertain visual observations (Ma, Navalpakkam, Beck, Berg, & Pouget, 2011; Vincent, Baddeley, Troscianko, & Gilchrist, 2009). In summary, there is strong support for the notion that visual stimuli are near-optimally assessed by observers for the degree of evidence that either a target is present or absent (in detection tasks), or in one location or another (in AFC localization tasks). 
The role of expectations
Peripheral cuing
In early work, peripheral cues that indicated the upcoming location of a single above-threshold display item were used to manipulate observer's expectations (Posner, Nissen, & Ogden, 1977; Posner, Snyder, & Davison, 1980). When a precue correctly indicated the upcoming target location 80% of the time, simple reaction time advantages were seen at the cued location and disadvantages at the uncued location. Posner et al. considered but rejected the notion that these costs and benefits were the result of changes at a decision level. More specifically, it could be that knowledge of the cue validity could shift the decision threshold (also known as response criterion or bias) so as to decrease the amount of visual evidence required to indicate “target present” at the cued location or to increase the visual evidence required to indicate “target present” at uncued locations. Instead, they favored an account whereby cognitive decisions fed back through an attentional spotlight that influenced sensory evidence. 
The mechanism by which this could work is not fully understood, but the net effect could be increased d′ sensitivity. This sensitivity account was supported by Bashinski and Bahcarach (1980) in a cued detection experiment. A variety of studies do seem to suggest that peripheral cues can influence visual sensitivity; however, this is contentious. Smith (2000) pointed out that cue effects upon sensitivity are only found in studies where display items are backward masked: This includes the early Bashinski and Bahcarach study as well as more recent work. It is not the primary aim of this work to address the debate between decision threshold and sensitivity accounts: There may well be scope for both to be correct as they need not be mutually exclusive. The approach taken in this work is that while changes in sensitivity do certainly seem to occur in some situations, the majority of cuing effects can be attributable to decision threshold changes (Eckstein, Shimozaki, & Abbey, 2002; Eckstein, Pham, & Shimozaki, 2004; Gould, Wolfgang, & Smith, 2007; Müller, 1994; Müller & Findlay, 1987; Shimozaki, Eckstein, & Abbey, 2003), and sensitivity changes are not necessarily the mechanism underlying focal selective attention (Solomon, 2004). 
Spatial probability
Expectations can also be derived from our past experience of where targets are. Given that objects are not uniformly distributed in space, having the ability to learn and utilize where objects are more likely to occur in space would be of clear benefit to any searcher. There is abundant evidence that we can learn the low- and high-level statistical properties of our visual environment (Brady & Oliva, 2008; Fiser & Aslin, 2001, 2002; Rosenthal, Fusi, & Hochstein, 2001; Turk-Browne, Jungé, & Scholl, 2005) so one would assume it is a trivial matter to learn that a target is more likely to occur in one location than another (Druker & Anderson, 2010; Geng & Behrmann, 2005). 
Early results using speeded responses to unambiguous sensory information, in fact, did not show any simple reaction time benefit for knowing the probable location of a display item within a block (Posner et al., 1980). Their interpretation was that people could not assign attention to a high probability location in the absence of a precue. 
Results from unspeeded responses made to ambiguous stimuli show a different result however. While there were some concerns that apparent learning of spatial probabilities could be due to trial-to-trial repetition effects (Hillstrom, 2000; Walthew & Gilchrist, 2006), Druker and Anderson (2010) showed that this is not the case. By presenting targets drawn from a continuous distribution across space (unlike previous studies that examined a low number of discreet item locations), there was a very low chance of a location repeat. It appears that internal expectations about an upcoming target's location can be generated and utilized by endogenous knowledge of where targets are most likely to appear. 
Ability to optimally allocate expectations
Experiments 1 and 2 are designed so that in comparable expectation conditions, observers have equivalent levels of information about where a target may be located. One prediction would be that, if we can utilize information in an optimal manner, and responses are unspeeded, then observers should be able to achieve identical asymptotic performance levels in corresponding expectation conditions in both experiments. However, given the known differences between exogenous and endogenous covert orienting (Jonides, 1981; Müller & Rabbitt, 1989; Posner, 1978), one might predict that there could be differences between the experiments. 
The most pertinent aspect here is the time course; Experiment 1 requires rapid establishment of expectations on every trial, but in Experiment 2, spatial expectations are held constant over each experimental block. Examination of simple reaction time benefits shows that peripheral cues provide a transient benefit reaching peak at ∼100 ms post-cue onset before decaying at about 200 ms (Müller & Findlay, 1988; Shepherd & Müller, 1989). However, care should be taken when using these findings to make predictions of performance or choice reaction times in unspeeded response paradigms. Similar analysis of the time course in such studies evaluating performance shows similar but less clear results. In one study involving cued discrimination, performance rose rapidly as the CSOA increased to 100 ms but then decreased to a stable level (Nakayama & Mackeben, 1989). The authors interpreted this as evidence for a fast but transient component and a slow but sustained component of covert orienting. In AFC localization tasks, Cheal and Lyon (1991) also find support for two separate mechanisms: The effects of a peripheral cue took ∼100 ms to reach maximal performance, but central cues required ∼300 ms. Importantly however, the asymptotic performance of both central and peripheral cues were approximately the same, indicating that a central endogenous and a peripheral exogenous cue can result in the same levels of performance as long as ∼300 ms is provided to incorporate the information that the cue provides. Another important result was that performance did not in fact decay for longer CSOAs, which is consistent with the idea that once the information from the cue has taken effect, its effects do not decay—at least not over CSOAs up to 500 ms. This evidence would predict that in this unspeeded response paradigm, information provided by exogenous cues (Experiment 1) and endogenous spatial probability (Experiment 2) may well result in identical levels of performance. 
However, what can be predicted of the choice reaction times required to achieve potentially near-optimal performance? Here, insight can be gained from the anticorrelated 0% and 10% conditions. In these conditions, it could be that endogenous expectations can be held constant, but exogenous expectations are (unhelpfully) drawn toward the manipulated location (e.g., Jonides, 1981; Müller & Rabbitt, 1989; Remington, Johnston, & Yantis, 1992). Two outcomes may be predicted: 
  1.  
    The sensitivity account: If the sensitivity (d′) account is correct, then in the 0% condition, the cue will increase the visual sensitivity at the cued location, which will, in fact, be a distracter location. In this case, the visual evidence proceeding to the decision stage will have been altered; therefore, performance will be different compared to the spatial probability experiment, but response times might be identical.
  2.  
    Decision threshold account: Alternatively, according to the decision threshold account, expectations may well be suboptimal at the time of the brief stimulus display, but the visual evidence will be unaffected. Instead, the visual evidence will be available, but choice reaction times might be longer (for cued 10% and 0% conditions) while the internal expectation weightings are changed back to optimal levels. Performance between experiments should be identical.
Quantitative models
Model 1: Heuristic observer
A heuristic solution to the 4-SAFC task that was found to produce respectable accounts of the data in a pilot study works as follows. Rather than combining prior beliefs and visual observations, people may simply use one or the other based on which is more reliable. For example, if we know the reliability of our visual observations and that we can achieve a certain level of localization performance using observations alone, prior beliefs may be ignored. However, if we know that our visual observations are somewhat unreliable, we may decide to rely solely upon prior beliefs in situations where they are strong: such as in the 75% or 100% valid conditions. The heuristic observer uses either prior beliefs or visual observations in each condition, depending on which will result in highest localization performance (see 2 for model details). 
Model 2: Bayesian optimal observer
A Bayesian optimal observer model was constructed to produce predictions in the 4-SAFC tasks (see 1). This observer uses prior spatial beliefs along with the visual evidence (likelihood; see Figure 2). This likelihood is dependent on the observer's generative model of the process that gave rise to the visual stimuli. In other words, the optimal observer has an internal model of the task structure (Figure 1) that generates each possible display alternative and backward infers the state of the world (the location of the target). This means that the visual evidence does not just evaluate how likely it is that a target is present in each location but looks at the entire stimulus display to examine the likelihood of a target being in location i and that distracters occupy all other locations. The end result after prior expectations (attentional weightings) and visual evidence are combined is a set of internal probabilities of where the target is, one for each potential target location. This is called the posterior distribution of target location given the data available. A simple decision rule could use this information and select the location with the highest posterior probability of containing the target (see 1). 
Figure 2
 
A schematic of the Bayesian optimal observer for localization, where higher activations are represented by lighter shades of gray. Noisy observations of 4 stimulus (target or distracter) orientations are made. A target similarity map represents the likelihood of each observation being a target and the converse with a distracter similarity map. A final likelihood is computed, representing how consistent the data is with a target being in each of the locations. For it to be likely that a target is in a given location, the observation at that location must look like a target and observations at all other locations must look like distracters. These likelihoods are multiplied by the prior to result in the posterior probability (best guess) of a target being in each location given the data available. Prior expectations of where a target will occur can be manipulated in different ways.
Figure 2
 
A schematic of the Bayesian optimal observer for localization, where higher activations are represented by lighter shades of gray. Noisy observations of 4 stimulus (target or distracter) orientations are made. A target similarity map represents the likelihood of each observation being a target and the converse with a distracter similarity map. A final likelihood is computed, representing how consistent the data is with a target being in each of the locations. For it to be likely that a target is in a given location, the observation at that location must look like a target and observations at all other locations must look like distracters. These likelihoods are multiplied by the prior to result in the posterior probability (best guess) of a target being in each location given the data available. Prior expectations of where a target will occur can be manipulated in different ways.
Biased expectations?
It is important to note that the optimal observer examined assumes that there is no uncertainty about what the spatial expectations are: if observers are told that a cue has 75% validity, then it is assumed that they believe this and assign exactly this level of uncertainty to the cued location. This assumption may well not be true but is made in this work as a reasonable starting point. Martins (2005, 2006) describe results that suggest that when observers have uncertainty in their degree of belief, this may result in apparent expectation biases. For example, the fact that observers are told about the long-run frequency of cue validity or spatial probability provides them with information, but they were not instructed about the sample size. Adaptive probability theory (Martins, 2005) proposes that this results in uncertainty over what the prior probabilities are, in turn resulting in apparent probability biases. For example, observers may underweight high expectation levels but overweight low expectation levels. This pattern of probability bias is ubiquitous in high-level economic decision making and was descriptively accounted for in the Prospect Theory (Kahneman & Tversky, 1979). While the Prospect Theory and other approaches (Prelec, 1998) descriptively account for such biases, Martins potentially explains why such biases should occur. In this work, I did not assess a Bayesian observer with uncertain prior beliefs: Instead, I evaluated a third model where biased expectations are combined optimally with visual evidence. 
Predictions
Based on results outlined above where human performance in detection and localization tasks are well accounted for by optimal observer models, it is predicted that performance in this task will also be close to that of an optimal observer. Lowest performance should be seen in the 25% expectation conditions as observers only have information available from the briefly displayed stimulus and no meaningful expectations. It is predicted that performance should increase as the amount of information about a target's location increases: This includes conditions that are correlated with target location (>25% conditions) but also those anticorrelated with the target (0% and 10% conditions). 
In this unspeeded response paradigm, it is predicted that performance should be highly similar in the peripheral cuing condition and spatial probability conditions. However, based on previous findings that peripheral cues often reflexively drawn attention to them (Jonides, 1981; Müller & Rabbitt, 1989; Remington et al., 1992), it is predicted that choice reaction times in the anticorrelated cue conditions will be longer with spatial cues as compared to spatial probability-based expectations. This is because the information is available for higher levels of performance to be obtained (observers know that targets are not at the cued location) but that it may take additional time for the cue effects upon expectation to be overcome by endogenous knowledge of the task structure. 
A final prediction is that accounting for possible biases in observer's spatial expectations may result in a better explanation of localization performance. 
Experiments
In Experiment 1, participants indicated the location of a target based on a briefly presented stimulus. In addition to this visual evidence, their degree of belief was manipulated via block-wise cue validity. A cue–stimulus onset asynchrony (CSOA) of 133.33 ms was used, but it is important to note that unlike in experimental paradigms measuring simple reaction times, performance is independent of CSOA (Cheal & Lyon, 1991; Gottlob, Cheal, & Lyon, 1999). This result was replicated in a pilot experiment; therefore, the performance obtained in Experiment 1 is not parochial to a specific CSOA. 
Theoretically, if information provided by the expectations and observations are held constant, performance should be the same, regardless of how those expectations arose. Thus, Experiment 2 provides subjects with expectations by the alternative approach of spatial probability manipulation. The experiment is largely the same, but instead of cues, in each experimental block a location is selected and the target will be at that location a certain proportion of the time. 
Methods
Participants
In Experiment 1, 8 graduates or postgraduates were paid £5 to take part. Some had multiple practice sessions and were instructed to respond as accurately as possible, not as fast as possible. Two participants were excluded from analysis on the basis of producing an incoherent psychometric function in the initial calibration procedure (outlined below) and near-chance levels of performance. In Experiment 2, the same 6 participants included in Experiment 1 were used and were paid £5. All were naive to the aims of the experiment. 
Apparatus
Images were displayed on a CRT monitor with a resolution of 1920 × 1440 and a refresh rate of 60 Hz, thus a screen refresh interval of 16.66 ms. The room was kept at constant low illumination levels. Participants used a chin rest, resulting in a stable viewing distance of 55 cm. Participants responded with a Cedrus response box. 
Stimuli
Stimuli were presented using Matlab (Mathworks) and the Psychophysics Toolbox (Brainard, 1997). Targets and distracters were high-contrast Gabors with spatial frequency of 4 cpd, and the Gaussian envelope had a standard deviation of 0.22°. Target μ T and distracter μ D orientations were always k° clockwise and anticlockwise from vertical, respectively. This allowed the reasonable assumption to be made that there is equal internal uncertainty associated with targets and distracters. The target distracter difference 2k° determines the performance of the subjects, and k was chosen for each subject to achieve approximately 50% performance in a 4-SAFC pretest using a psychometric procedure (method of constant stimuli). All display items were presented at constant retinal eccentricity of 11.2° from screen center to minimize retinal eccentricity-based variation in detection probability. Where cues were used, they were Gaussian of standard deviation of 0.22°, the same size as the target and distracter Gabors. 
Experiment 1 procedure
A 4-SAFC procedure was used (see Figure 1). The target was present on every trial and was in 1 of 4 locations at random, such that there was a 1/4 chance of the target being in each location; there was a distracter item in the remaining 3 locations. The participant had to indicate the location of the target with a manual button press. The actual target location t and the participants' manual response r were recorded. A small fixation blob was continuously present at screen center to avoid problems maintaining accurate vergence on the screen. Cues appeared for 50 ms (all durations presented as multiples of the screen refresh interval), followed by an interstimulus interval of 83.33 ms. This results in a CSOA of 133.33 ms. This was chosen because in previous studies it took ∼100 ms for performance in peripheral cuing in AFC tasks to asymptote (unpublished pilot experiments; Cheal & Lyon, 1991; Gottlob et al., 1999). 
Stimuli for each trial were presented for 133.33 ms before being removed, thus controlling for eye movements during stimulus display. The crucial thing is that the visual stimuli were insufficient to reduce all uncertainty of target location: This has been achieved by backward masking in previous studies (Gottlob et al., 1999), but here it was achieved via short display durations and high similarity between target and distracter Gabor orientation. The focus here is not how participants learn cue validity over time (see Droll, Abbey, & Eckstein, 2009); instead, the question is how well they can utilize a known cue validity. Participants were informed of the cue validity at the start of every block and a small blob provided feedback of the true target location in order for subjects to help assess cue validity. Instructions were of the nature, “The cue has a [75%] chance of indicating the target location on each trial.” 
The following procedure was used on each trial (see Figure 1): A cue location was selected uniformly randomly and the target occurred at this location with probability (ε = cue validity). If the target was not at the cued location, it had a 1/3 chance of being in any of the remaining non-cued locations. Cue validity was manipulated in experimental blocks from 0% (target never in location of cue), 10%, 25% (uninformative, neutral condition), 50%, 75%, and 100% (target always at cue location). Each block consisted of 50 trials, but each was repeated twice, resulting in 100 trials per condition per observer. Condition order was randomized to minimize the impact of learning or fatigue effects. 
Experiment 2 procedure
A 4-SAFC task was run as before, but no cuing was used (see Figures 1c and 1d). Following a 1000-ms pretrial interval of central fixation, a stimulus display was presented for 133.33 ms. The participants then made a manual response, and brief feedback of the true target location was given. In each block (order randomized), the spatial probability was manipulated. That is, the probability of a target appearing at a location m that was chosen at random in each block. If the target did not appear in location m, then it appeared in either of the 3 other locations with uniform random probability. As the aim of this study was not to examine the temporal dynamics of learning spatial probability but to examine how we combine known spatial expectations with observations, subjects were told the probability of the target appearing at the manipulated location at the start of each block. Task instructions were of the nature, “The target has a [75%] chance of appearing in the [top left] on each trial.” 
Analysis
For each participant, for each condition with T = 100 trials, two measures were evaluated. The first is the number of correct responses, and the second is the number of times the participant responded to the manipulated location. Given that the measures straightforwardly translate into proportions, the binomial distribution was used both to evaluate the likelihood for each condition and to determine 95% confidence intervals of participant performance. The binomial distribution was also used in order to find the maximum likelihood model parameters (see 4). Parameters were optimized to fit the proportion of correct responses. 
It is common to take into account the fact that participants occasionally guess. This could be caused by blinking during the short display duration, for example. Lapse rates (λ) are taken into account in both models examined, and this was done in the Monte Carlo stage (see 4). 
All analyses were first conducted on the data from individual participants: The data were found to exhibit little interobserver variability. Because the visual evidence was calibrated before the main experiment to result in 50% performance in the 25% condition, data could be collapsed across observers. 
Results
Measure 1: Percent correct localizations
Performance over the various degrees of belief, for both experiments, are shown in Figure 3 (left). Participants exhibited a skewed U-shaped performance function, with lowest performance being seen in the neutral 25% condition: 49.0% (45.7%–52.4%; maximum likelihood and 95% confidence intervals). In all other conditions, the cue provides additional information of upcoming target location and was associated with higher performance. As cue validity increased, providing more information about the upcoming target location, performance increased to the point where a 100% valid cue resulted in approximately 100% performance. 
Figure 3
 
Results for (top row) Experiment 1 and (bottom row) Experiment 2. Performance and probability of responding to the manipulated location are shown. Error bars denote 95% confidence intervals, determined by binomial distributions. Model fits are shown for the Bayesian optimal observer (black lines) and the heuristic observer (gray lines) using maximum likelihood d′ and lapse rate parameters. Reaction time advantages, relative to the neutral 25% condition, are shown; error bars denote 95% confidence intervals of the mean, calculated by a bootstrap method.
Figure 3
 
Results for (top row) Experiment 1 and (bottom row) Experiment 2. Performance and probability of responding to the manipulated location are shown. Error bars denote 95% confidence intervals, determined by binomial distributions. Model fits are shown for the Bayesian optimal observer (black lines) and the heuristic observer (gray lines) using maximum likelihood d′ and lapse rate parameters. Reaction time advantages, relative to the neutral 25% condition, are shown; error bars denote 95% confidence intervals of the mean, calculated by a bootstrap method.
Localization performance also improved relative to the 25% baseline when the manipulated location was anticorrelated with the target location, the 0% and 10% conditions. This is more prominent in the 0% condition, as seen by the non-overlap in the 95% confidence intervals. 
Measure 2: Probability of responding to the manipulated location
Figure 3 (middle column) shows how often participants responded to the manipulated location. When the cue validity was 0%, participants almost never responded at the cued location, but as cue validity increases, the probability of responding to the cued location steadily increased to the point where they nearly always did in the 100% valid condition. This is not quite a linear relationship however. 
Model fits
Maximum likelihood predictions of the Bayesian optimal observer are shown in Figure 3 (black lines) as are the predictions of the heuristic observer (gray lines). Visual inspection shows that for both experimental measures, the optimal observer provides a better account of the data. 
Model fits to the performance measure was evaluated quantitatively using AIC values (Akaike, 1974). In Experiment 1, the AIC values for Bayesian and heuristic observers were 62.8 and 82.1, respectively. The important metric is the difference of 19.3 that, using the scale of Burnham and Anderson (2002, 2004), translates into virtually zero support for the heuristic observer and extremely strong support for the Bayesian observer. Quantitative analysis of the fits to the proportion of responses to the cued location showed a difference in AIC values of 123.2. According to the scale of Burnham and Anderson (2002), this leaves no ambiguity that the Bayesian optimal observer was the better fit. In Experiment 2, the AIC values for Bayesian and heuristic models fit to performance were 51.9 and 60.0, respectively, showing the Bayesian model provides the best fit. The difference was 8.1, again showing very strong support for the Bayesian observer, according to the scale of Burnham and Anderson (2002). In terms of the proportion responses to manipulated location measure, the AIC values for Bayesian and heuristic models were 51.5 and 175.9, respectively. The difference was 124.4, again showing exceptionally strong support for the Bayesian model. 
The maximum likelihood parameters for Experiment 1 were: d′ = 0.89 and λ = 0.12% for the Bayesian optimal observer and d′ = 1.07 and λ = 0.10% for the heuristic model. For Experiment 2, the maximum likelihood parameters were d′ = 0.93 and λ = 0% for the Bayesian optimal observer and d′ = 1.11 and λ = 0% for the heuristic model. 
Choice reaction times
Figure 3 (right) shows the mean choice reaction time advantage provided by informative cues, with the 25% condition used as a neutral baseline. These RT benefits are the mean of pooled RTs, normalized by each subject's mean RT in the neutral condition. Average reaction time for the 25% condition was 1070 ms in Experiment 1 and 1024 ms in Experiment 2. 
In both experiments, reaction time benefits are seen for higher expectation levels, although this was ambiguous in the 50% condition. The large reaction time advantages in the 100% conditions presumably reflects the fact that there is zero uncertainty in the target location given an observation of the cue. 
There are interesting differences in reaction times for the low expectation levels where the manipulated and target locations are anticorrelated. In Experiment 2, it seems that endogenously driven expectations both allow for that information to be used (see higher performance compared to the neutral condition) and for quicker decisions to be made. In Experiment 1, however, low expectation levels give rise to reaction time disadvantages. This RT disadvantage is clearly consistent with the notion of reflexive covert orienting to peripheral precues even when they are anticorrelated with target location. 
Deviation from optimality?
While the Bayesian observer offered the superior account, the fit to the data was not perfect. Participants responded to the cued location less often than the optimal model predicts when the cue validity is high. Conversely, when the cue validity is low, participants respond to the cued location more often than the optimal model predicts. These differences are explored in depth in the next section. 
Biased expectations but optimal combination
In the Introduction section, it was pointed out that one possible issue could be that the degree of effective belief used by participants may not precisely correspond to the empirical level defined in the experiment. In other words, people could be biased in their degree of belief. This leads to two possibilities: that we optimally combine observations with biased expectations or that we suboptimally combine observations with unbiased expectations. 
It is difficult to test the latter, without knowing how to model suboptimal combination. Instead, I evaluated if the data from Experiments 1 and 2 could be accounted for better by optimal combination of biased expectations and observations. A function (Prelec, 1998), similar to that used in the Prospect Theory, was used to relate actual expectation to effective (biased) expectation: ε′ = exp(−(−log(ε)) β ). Here, β controls the shape of the bias function. Importantly, when β = 1, then there is no bias present, ε′ = ε
Predictions of this new biased expectation observer (see 3) were calculated for a range of different bias parameter values to see which level of bias accounted for the data best. This was compared to the predictions where the bias parameter is equal to 1, which corresponds exactly to the unbiased optimal Bayesian observer. This biased expectation observer has 3 free parameters: the lapse rate, d′, and the bias parameter β
Figure 4a shows that for the grouped data, the maximum likelihood bias parameter is β = 0.78 for both experiments. This bias function is plotted in Figure 5 (right column). Does this amount of bias actually account for data better than when there is no bias? To answer this, the likelihood is plotted as a ratio, relative to the likelihood of the unbiased optimal observer (β = 1). For both experiments, the likelihood ratio is extremely high: 193.77 and 188.7 for Experiments 1 and 2, respectively. This shows that the data are much more likely to have been generated by a Bayesian observer with a degree of bias in their prior beliefs. 
Figure 4
 
Analysis of biased expectations, showing the goodness of fit (likelihood) of different levels of bias relative to the unbiased optimal observer. The optimal observer corresponds to β = 1 and is indicated by gray crosshairs. Individual participant bias analysis (middle, right) shows that maximum likelihood bias parameters range from ∼0.6 to 1.
Figure 4
 
Analysis of biased expectations, showing the goodness of fit (likelihood) of different levels of bias relative to the unbiased optimal observer. The optimal observer corresponds to β = 1 and is indicated by gray crosshairs. Individual participant bias analysis (middle, right) shows that maximum likelihood bias parameters range from ∼0.6 to 1.
Figure 5
 
Comparison of Bayesian optimal observer (black lines) and biased expectation observer (gray lines) for (top) Experiment 1 and (bottom) Experiment 2. Left and middle panels show performance and proportion of responses at the manipulated location. Right panels show the probability weighting function relating actual expectation ε to effective expectation ε′. Expectation was overweighted at low expectation values and underweighted at high expectation levels.
Figure 5
 
Comparison of Bayesian optimal observer (black lines) and biased expectation observer (gray lines) for (top) Experiment 1 and (bottom) Experiment 2. Left and middle panels show performance and proportion of responses at the manipulated location. Right panels show the probability weighting function relating actual expectation ε to effective expectation ε′. Expectation was overweighted at low expectation values and underweighted at high expectation levels.
Models with more parameters tend to fit data better, but is the increased fit sufficiently large to trade off model simplicity by introducing another parameter? To address this point, AIC measures were used to control for model complexity. The AIC value for the Bayesian optimal model (where β = 1 and number of model parameters = 2) was calculated as 60.0. The AIC value for the biased expectation observer (with 3 model parameters) was calculated as 51.5. The best (lowest) corresponds to the biased optimal model, and the magnitude of the difference is 8.5, which provides “considerable” support for the biased observer according to the scale set out by Burnham and Anderson (2002). 
This bias analysis is shown for each individual participant (Figures 4b and 4c). The picture that emerges is that different degrees of bias may be displayed by different participants. Some participants are clearly unbiased in both experiments; some have different levels of bias, and others display consistent bias. It is not clear why this variability exists: It could potentially relate to differences in levels of effort or focus, in experience with psychophysical experiments, or in the extent that they effectively believe the prior probabilities provided to them. The likelihood ratio reaches ∼10 for some participants, providing clear support for the bias providing a better account of the data. 
The influence of biased expectations in the localization tasks can be seen by the model predictions in Figure 5. It shows the model predictions of the optimal observer with no bias (β = 1; black lines) and for the best fitting bias value (gray lines). The biased expectation observer is clearly a much better fit to the performance data, being nearly perfect in fact. Bearing in mind that the biased model predictions are fitted to the performance data only, the fact that the percent responses to manipulated location measure is also accounted for is strong support for the biased expectation observer. There is still a slight discrepancy between the grouped data for both measures and the biased expectation predictions. These could potentially be minimized by exploration of slightly different kinds of expectation bias functions, but the analysis is unambiguous in support of mechanisms that optimally combine visual evidence with slightly biased expectations. Could this simple model actually represent the processes underlying participants' performance in these tasks? 
General discussion
Objectives
This study attempted to manipulate spatial expectations in a 4-SAFC localization task with the aim of seeing whether humans can optimally integrate prior expectations and visual evidence (likelihood), as Bayes' equation would predict. To this end, a Bayesian optimal observer model was examined, whereby attention acts as a set of expectation weightings over space. To do this, the 4-SAFC task was used with unspeeded responses stressing the importance of accurate responding, recording percent correct localizations, proportion of responses to the manipulated location, and choice reaction times. 
Findings and implications
Optimal combination but biased expectations
Participants displayed a range of levels of bias: All biases were in the direction of overweighting the role of expectation when it was low and underweighting expectation when it was high. It is intriguing that this is the same type of bias that is seen in high-level cognitive and economic decision making as accounted for by the Prospect Theory (Kahneman & Tversky, 1979). One cannot be sure if the internal expectation was actually biased or if there is some other process influencing participants' responses. Either way, the case for biases in effective expectation had clear qualitative and quantitative support, even when controlling for the added model complexity. 
Where do such biases come from? While describing such probability biases have been the topic of much debate (e.g., Prelec, 1998), Adaptive Probability Theory (Martins, 2005, 2006) provides a potential explanation for why such biases occur (but also see Gonzalez & Wu, 1999). Martins' account suggests that these biases in expectations are not normatively optimal for laboratory settings (where precise expectations can be defined) but are the result of a sensible heuristic that allows near-optimal behavior in real-world situations where there is often uncertainty over what the appropriate degree of belief is. The Bayesian optimal model tested in this work was assumed to know the precise degree of belief, as provided by the experimenter, and to accurately internalize and use this. Figure 3 shows that this is a good approximation to the actual performance of people in these tasks; however, the analysis of bias clearly showed that expectation biases were present. These biases may reflect, as per the suggestion of Martins (2005, 2006), that participants had uncertainty over their expectations. 
The individual analysis of expectation biases (see Figure 4) shows that there is some variation in the extent of the bias shown. One participant showed minimal to zero bias in both experiments. It is interesting to speculate whether increased experience with this task or perhaps more extensive performance feedback could allow participants to be less biased in their expectations. 
Exogenous and endogenous cuing
Behavioral results from simple RT studies (e.g., Jonides, 1981) highlight a distinction between exogenous and endogenous methods of cuing. This may have led to reasonable predictions that (a) different cuing methods would have resulted in different performance levels and (b) different neural mechanisms underlie exogenous and endogenous orienting. 
However, performance in the present study was independent of the cuing method used, and this was consistent with a Bayesian observer that treats information as information, regardless of its source. This is perhaps the first demonstration of optimal utilization of information across both endogenous and exogenous methods of cuing. 
These predictions and results also align with what is known about neural mechanisms: Imaging studies of healthy humans show that covert exogenous and endogenous orienting is mediated by the same neural structures in a so-called frontoparietal network (Peelen, Heslenfeld, & Theeuwes, 2004; Rosen et al., 1999). Perhaps the aims of this network is to extract and combine all relevant information, resulting in the best estimate of a target's location. 
Exogenous cues temporarily influence expectations
There were subtle differences in choice reaction times, however, between exogenous and endogenous methods of cuing. How can these be understood in the present Bayesian framework? In the endogenous (spatial probability) experiment, it is straightforward that choice reaction time advantages are seen in all conditions relative to neutral. In dynamic decision-making models, evidence accumulates for a target being in location 1 to N. The spatial prior expectations delivered by an endogenous cue could be viewed as increasing the initial evidence compared to a situation of no prior expectations. This increased initial evidence would then lead to reaching a threshold level faster than in the zero spatial expectation (neutral) condition. 
However, in the cuing manipulation, reaction time disadvantages were observed in the 0% and 10% conditions. The most straightforward explanation of this is that attention (i.e., expectation) is reflexively drawn toward the cued location. In these conditions, this would act as a “false start” as the cued location indicates a location where the target has only a 10% or 0% chance of being. As the visual evidence is seemingly unaffected, then observers can easily comply with the task instructions for maximal performance, simply by delaying their response, which presumably allows for the cue-induced false start in expectations to be overridden by endogenous, more appropriate expectation weightings. In cuing conditions above 25%, then presumably the reflexive influence upon expectation acts with, rather than against, the appropriate set of prior expectation weightings. 
Spatial attention = spatial expectation?
The results are supportive of the Bayesian optimal weighting account (Eckstein, Pham et al., 2004; Eckstein et al., 2002; Müller & Findlay, 1987; Shaw, 1982; Shimozaki et al., 2003) and are slightly inconsistent with a sensory-level account. If the sensory information passing to the decision stage was altered by peripheral cues, then one would not have expected the performance to be identical to endogenous cuing. Thus, any cue-induced changes in visual sensitivity must have been minor or non-existent. However, this work does not directly refute (or necessarily exclude) a role for sensory-level explanations contributing to the effects. 
If one is persuaded by the weighting account, then it is interesting to speculate that attention (in such a spatial localization task) is precisely definable as a set of prior expectations over space. An additional advantage of having such a clear definition is that precise predictions can be derived from a model based on probability theory with clear assumptions and few parameters. An alternative interpretation, along the lines of Anderson (2011), would be that weighting visual evidence by spatial expectations is the causal mechanism, giving rise to the attentional effect of localization performance changes. 
Conclusion
The hypothesis that observers optimally combine their prior beliefs about the location of an upcoming target with their uncertain visual observations was tested. While holding the amount of visual information constant, spatial prior beliefs were manipulated using either exogenous peripheral cues or endogenous task instruction. The behavioral data and modeling resulted in 4 main findings: First, people do seem to optimally combine their prior beliefs with their current sensory evidence; however, sometimes one's prior beliefs are biased. Second, these biases mirror those probability biases descriptively accounted for by the Prospect Theory (Kahneman & Tversky, 1979). These biases do represent deviations from purely normative optimal behavior in this specific search task but could possibly be due to people having uncertainty about their prior beliefs (Martins, 2005, 2006). Third, spatial expectations provided by exogenous or endogenous means can be processed identically, in terms of information utilization. However, counterpredictive peripheral cues increase choice RTs due to an apparently unavoidable capture effect. Fourth, the changes in target localization performance observed can be accounted for by people simply weighting the visual evidence with their prior spatial expectations. 
Appendix A
Bayesian optimal observer
Optimal observers for AFC localization tasks have been described previously by Cameron et al. (2004), Eckstein, Abbey et al. (2004), and Eckstein et al. (2009). 
On any given trial, the Bayesian optimal observer calculates a decision variable for each location, which corresponds to the posterior probability of target presence. The observer responds that the target is in the location with the highest value of d i . If there are N locations L = {L i , …, L N } with corresponding noisy observations x, then the decision variable can be written as follows: 
d i = P ( L i = 1 | x 1 , , x N ) P ( x 1 , , x N | L i = 1 ) · P ( L i = 1 ) .
(A1)
The term P(L i = 1) is the prior probability of the target appearing in location i. If this manipulated location has a probability ε of containing the target, then the prior for the manipulated location is P(L i=m = 1) = ε, and for all other locations, the prior is P(L im = 1) = (1 − ε)/(N − 1). 
The term P(x 1, …, x N L i = 1) defines the likelihood of observing all noisy observations given target presence at location i. This is simply the likelihood of the observation at location i being target and distracters at all other locations: 
P ( x 1 , , x N | L i = 1 ) = P ( x i | L i = 1 ) j i P ( x j | L j = 0 ) .
(A2)
The term P(x i L i = 1) is the likelihood of making a noisy observation x i , given that a target was present at location i. Therefore, this distribution is a noise model describing the variability of internal responses to targets, and it is described here as P(x i L i = 1) =
N
(x i ; μ T, σ T 2), where μ T is the target orientation of +k° and σ T 2 is the variance of internal responses. 
The term P(x j L j = 0) is the same as above but describes the likelihood of a particular noisy internal response at location j, given that a target was absent in location j. It is described as P(x i L i = 0) =
N
(x i ; μ D, σ D 2), where μ D = −k°. For this work because the orientation of targets and distracters were rotated equal amounts k from vertical, it was assumed that σ T 2 = σ D 2
The Bayesian optimal observer has 2 free parameters θ = (d′, λ), where d′ is directly related to the degree of overlap in the distributions of internal responses to targets
N
(μ T, σ T 2) and distracters
N
(μ D, σ D 2). However, any non-zero lapse rate will affect this relation. 
It is important to note that this model assumes no uncertainty in what the prior expectation is. In reality, participant's may well have uncertain expectations. 
Appendix B
Heuristic model
The heuristic model was assumed to have some knowledge about its d′ performance such that it can accurately decide at what expectation level it will switch between using observations and expectations. Below this threshold, observations are used and a decision variable is used, which is identical to the Bayesian observer (Equation A1) but without any prior expectation. The response is directed to the location with the highest value of d i : 
d i P ( x 1 , , x N | L i = 1 ) .
(B1)
Above this threshold expectation level, the heuristic observer always responds at the manipulated location m
Appendix C
Biased expectation observer
The biased optimal observer only differed from the Bayesian optimal observer in a few ways. First, the internal priors now no longer correspond to the real degree of belief ε. Instead, they are based on modified weighting ε′ that is a function of the true level of expectation ε, P(L i=m = 1) = ε′ and P(L im = 1) = (1 − ε′)/(N − 1). In other words, the prior expectation at the manipulated location is equal to the biased expectation, and the remainder is uniformly distributed in the non-manipulated locations. The nature of the bias is described by ε′ = exp(−(−log(ε)) β ) (see Prelec, 1998). Second, the model has three free parameters θ = (λ, d′, β). 
Appendix D
Monte Carlo trials
The predicted performance p c for a given expectation condition c = [1, 2, 3, 4] was determined by Monte Carlo simulation. On a given trial, 1 of the 4 locations m is manipulated with a degree of belief ε. The target location t is randomly sampled, with a probability of ε at being at location m and a probability of (1 − ε)/(N − 1) of being in any location other than m
A simulated set of noisy observations x = x 1, …, x N is then produced by sampling from normal distributions: for the actual target location, x i=t =
N
(k, σ 2) and x it =
N
(−k, σ 2); 100,000 such simulated trials were used to evaluate the likelihood of any given parameter combination. 
Lapse rates (λ) were dealt with by taking a proportion of simulated trials equal to the lapse rate and choosing the response location on a uniform random basis. This means that on λ/N trials, the model will accidentally indicate the correct location. 
For a given set of parameter values θ, each model predicted a percent correct localization performance for each condition, p c . The model response location was determined by the location with the highest valued decision variable d i . If this corresponded to the actual target location, the response was correct. Therefore, model performance in a given expectation condition was the proportion of simulated trials where the decision variable for the actual target location was higher than all non-target locations. 
Maximum likelihood model parameters were determined. The likelihood function was the product of binomial likelihoods over all expectation levels (calculated as the sum of log likelihoods). If p c is the model predicted percent correct in condition c and k c is the number of correct trials out of 100 in condition c, then the likelihood is given by Σ c=1 6log[
( 100 k c )
p c k c (1 − p c )(100−k c )]. Matlab's “fminsearch” function was used to minimize the negative log likelihood, and because each model had either 2 or 3 parameters, it was possible to fit all parameters simultaneously. This was done multiple times to confirm correct convergence on the maximum likelihood parameters. 
Acknowledgments
The author would like to thank two anonymous reviewers for their useful and constructive feedback. 
Commercial relationships: none. 
Corresponding author: Benjamin Vincent. 
Email: b.t.vincent@dundee.ac.uk. 
Address: School of Psychology, University of Dundee, Park Place, Dundee DD1 4HN, UK. 
References
Akaike H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723. [CrossRef]
Anderson B. (2011). There is no such thing as attention. Frontiers in Psychology, 2, 246. [CrossRef] [PubMed]
Bashinski H. S. Bahcarach V. R. (1980). Enhancement of perceptual sensitivity as the result of selectively attending to spatial locations. Perception & Psychophysics, 28, 241–248. [CrossRef] [PubMed]
Bayes T. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53, 370–418. [CrossRef]
Bishop C. (2006). Pattern recognition and machine learning. NY: Springer.
Brady T. F. Oliva A. (2008). Statistical learning using real-world scenes: Extracting categorical regularities without conscious intent. Psychological Science, 19, 678–685. [CrossRef] [PubMed]
Brainard D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436. [CrossRef] [PubMed]
Burnham K. P. Anderson D. R. (2002). Model selection and multi-model inference: A practical information-theoretic approach. NY: Springer.
Burnham K. P. Anderson D. R. (2004). Multimodel inference: Understanding AIC and BIC in model selection. Sociological Methods & Research, 33, 261–304. [CrossRef]
Cameron E. L. Tai J. C. Eckstein M. P. Carrasco M. (2004). Signal detection theory applied to three visual search tasks—Identification, yes/no detection and localization. Spatial Vision, 17, 295–325. [CrossRef] [PubMed]
Cheal M. Lyon D. R. (1991). Central and peripheral precuing of forced-choice discrimination. Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 43, 859–880. [CrossRef]
Dosher B. A. Han S. Lu Z. (2004). Parallel processing in visual search asymmetry. Journal of Experimental Psychology: Human Perception and Performance, 30, 3–27. [CrossRef] [PubMed]
Dosher B. A. Han S. Lu Z.-L. (2010). Information-limited parallel processing in difficult heterogeneous covert visual search. Journal of Experimental Psychology: Human Perception and Performance, 36, 1128–1144. [CrossRef] [PubMed]
Doya K. S I. Pouget A. Rao R. (Eds.) (2007). Bayesian brain: Probabilistic approaches to neural coding. Cambridge, MA: MIT Press.
Droll J. A. Abbey C. K. Eckstein M. P. (2009). Learning cue validity through performance feedback. Journal of Vision, 9(2):18, 1–23, http://www.journalofvision.org/content/9/2/18, doi:10.1167/9.2.18. [PubMed] [Article] [CrossRef] [PubMed]
Druker M. Anderson B. (2010). Spatial probability aids visual stimulus discrimination. Frontiers in Human Neuroscience, 4, 1–10. [PubMed]
Eckstein M. (1998). The lower visual search efficiency for conjunctions is due to noise and not serial attentional processing. Psychological Science, 9, 111–118. [CrossRef]
Eckstein M. Abbey C. K. Pham B. T. Shimozaki S. S. (2004). Perceptual learning through optimization of attentional weighting: Human versus optimal Bayesian learner. Journal of Vision, 4(12):3, 1006–1019, http://www.journalofvision.org/content/4/12/3, doi:10.1167/4.12.3. [PubMed] [Article] [CrossRef]
Eckstein M. Pham B. T. Shimozaki S. S. (2004). The footprints of visual attention during search with 100% valid and 100% invalid cues. Vision Research, 44, 1193–1207. [CrossRef] [PubMed]
Eckstein M. Shimozaki S. S. Abbey C. K. (2002). The footprints of visual attention in the Posner cueing paradigm revealed by classification images. Journal of Vision, 2(1):3, 25–45, http://www.journalofvision.org/content/2/1/3, doi:10.1167/2.1.3. [PubMed] [Article] [CrossRef]
Eckstein M. Thomas J. P. Palmer J. Shimozaki S. S. (2000). A signal detection model predicts the effects of set size in visual search accuracy for feature, conjunction and disjunction display. Perception & Psychophysics, 62, 425–451. [CrossRef] [PubMed]
Eckstein M. P. Peterson M. F. Pham B. T. Droll J. A. (2009). Statistical decision theory to relate neurons to behavior in the study of covert visual attention. Vision Research, 49, 1097–1128. [CrossRef] [PubMed]
Fiser J. Aslin R. N. (2001). Unsupervised statistical learning of higher order temporal structure from visual scenes. Psychological Science, 12, 499–504. [CrossRef] [PubMed]
Fiser J. Aslin R. N. (2002). Statistical learning of higher-order temporal structure from visual shape sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 458–467. [CrossRef] [PubMed]
Geng J. J. Behrmann M. (2005). Spatial probability as an attentional cue in visual search. Perception & Psychophysics, 67, 1252–1268. [CrossRef] [PubMed]
Glimcher P. (2004). Decision, uncertainty, and the brain. Cambridge, MA: MIT Press.
Gonzalez R. Wu G. (1999). On the shape of the probability weighting function. Cognitive Psychology, 38, 129–166. [CrossRef] [PubMed]
Gottlob L. Cheal M. Lyon D. R. (1999). Time course of location-cuing effects with a probability manipulation. The Journal of General Psychology, 126, 261–270. [CrossRef]
Gould I. C. Wolfgang B. J. Smith P. L. (2007). Spatial uncertainty explains exogenous and endogenous attentional cuing effects in visual signal detection. Journal of Vision, 7(13):4, 1–17, http://www.journalofvision.org/content/7/13/4, doi:10.1167/7.13.4. [PubMed] [Article] [CrossRef] [PubMed]
Green D. M. Swets J. A. (1966). Signal detection theory and psychophysics. Los Altos, CA: Peninsula Publishing.
Hillstrom A. (2000). Repetition effects in visual search. Perception & Psychophysics, 62, 800–817. [CrossRef] [PubMed]
Jonides J. (1981). Voluntary versus automatic control over the mind's eye's movement. In Long J. Baddeley A. (Eds.), Attention and performance IX (pp. 187–202). Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Kahneman D. Tversky A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. [CrossRef]
Kruschke J. K. (2011). Doing Bayesian data analysis: A tutorial with R and BUGS. Burlington, MA: Academic Press.
Ma W. J. Navalpakkam V. Beck J. M. Berg R. v. d. Pouget A. (2011). Behavior and neural basis of near-optimal visual search. Nature Neuroscience, 14, 783–790. [CrossRef] [PubMed]
Martins A. C. R. (2005). Adaptive Probability Theory: Human Biases as an Adaptation. Cogprint preprint at http://cogprints.org/4377/.
Martins A. (2006). Probability biases as Bayesian inference. Judgment and Decision Making, 1, 108–117.
Müller H. J. (1994). Qualitative differences in response bias from spatial cueing. Canadian Journal of Experimental Psychology, 48, 218–241. [CrossRef] [PubMed]
Müller H. J. Findlay J. M. (1987). Sensitivity and criterion effects in the spatial cuing of visual attention. Perception & Psychophysics, 42, 383–399. [CrossRef] [PubMed]
Müller H. J. Findlay J. M. (1988). The effect of visual attention on peripheral discrimination thresholds in single and multiple element displays. Acta Psychologica, 69, 129–155. [CrossRef] [PubMed]
Müller H. J. Rabbitt P. M. (1989). Reflexive and voluntary orienting of visual attention: Time course of activation and resistance to interruption. Journal of Experimental Psychology: Human Perception and Performance, 15, 315–330. [CrossRef] [PubMed]
Nakayama K. Mackeben M. (1989). Sustained and transient components of focal visual attention. Vision Research, 29, 1631–1647. [CrossRef] [PubMed]
Peelen M. V. Heslenfeld D. J. Theeuwes J. (2004). Endogenous and exogenous attention shifts are mediated by the same large-scale neural network. NeuroImage, 22, 822–830. [CrossRef] [PubMed]
Posner M. Nissen M. Ogden W. (1977). Attended and unattended processing modes: The role of set for spatial location. In Pick H. Saltzman I. (Eds.), Modes of perceiving and processing information (pp. 137–157). Hillsdale, NJ: Erlbaum.
Posner M. Snyder C. Davison B. (1980). Attention and the detection of signals. Journal of Experimental Psychology, 109, 160–174. [CrossRef] [PubMed]
Posner M. I. (1978). Chronometric explorations of mind. Hillsdale, NJ: Erlbaum.
Prelec D. (1998). The probability weighting function. Econometrica, 66, 497–527. [CrossRef]
Remington R. W. Johnston J. C. Yantis S. (1992). Involuntary attentional capture by abrupt onsets. Perception & Psychophysics, 51, 279–290. [CrossRef] [PubMed]
Rosen A. Rao S. Caffarra P. Scaglioni A. Bobholz J. Woodley S. et al. (1999). Neural basis of endogenous and exogenous spatial orienting: A functional MRI study. Journal of Cognitive Neuroscience, 11, 135–152. [CrossRef] [PubMed]
Rosenthal O. Fusi S. Hochstein S. (2001). Forming classes by stimulus frequency: Behavior and theory. Proceedings of the National Academy of Sciences of the United States of America, 98, 4265–4270. [CrossRef] [PubMed]
Shaw M. (1982). Attending to multiple sources of information: I. The integration of information in decision making. Cognitive Psychology, 14, 353–409. [CrossRef]
Shepherd M. Müller H. J. (1989). Movement versus focusing of visual attention. Perception & Psychophysics, 46, 146–154. [CrossRef] [PubMed]
Shimozaki S. Eckstein M. Abbey C. (2003). Comparison of two weighted integration models for the cueing task: Linear and likelihood. Journal of Vision, 3(3):3, 209–229, http://www.journalofvision.org/content/3/3/3, doi:10.1167/3.3.3. [PubMed] [Article] [CrossRef]
Smith P. L. (2000). Attention and luminance detection: Effects of cues, masks, and pedestals. Journal of Experimental Psychology: Human Perception and Performance, 26, 1401–1420. [CrossRef] [PubMed]
Solomon J. A. (2004). The effect of spatial cues on visual sensitivity. Vision Research, 44, 1209–1216. [CrossRef] [PubMed]
Turk-Browne N. B. Jungé J. Scholl B. J. (2005). The automaticity of visual statistical learning. Journal of Experimental Psychology: General, 134, 552–564. [CrossRef] [PubMed]
Verghese P. (2001). Visual search and attention: A signal detection theory approach. Neuron, 31, 523–535. [CrossRef] [PubMed]
Vincent B. T. (2011). Search asymmetries: Parallel processing of uncertain sensory information. Vision Research.
Vincent B. T. Baddeley R. J. Troscianko T. Gilchrist I. D. (2009). Optimal feature integration in visual search. Journal of Vision, 9(5):15, 1–11, http://www.journalofvision.org/content/9/5/15, doi:10.1167/9.5.15. [PubMed] [Article] [CrossRef] [PubMed]
Walthew C. Gilchrist I. D. (2006). Target location probability effects in visual search: An effect of sequential dependencies. Journal of Experimental Psychology: Human Perception and Performance, 32, 1294–1301. [CrossRef] [PubMed]
Figure 1
 
Structure of the experiments. (a) The causal structure of cue and target location in the experiment shows that a cue is equally likely to occur in 1 of 4 locations. A target then has a probability of ε of being at the cued location and a probability of (1 − ε) / 3 of being in any other location. In Experiment 2, within an experimental block, a target appears in a chosen location with probability ε. Trial structures are shown in (b) and (d).
Figure 1
 
Structure of the experiments. (a) The causal structure of cue and target location in the experiment shows that a cue is equally likely to occur in 1 of 4 locations. A target then has a probability of ε of being at the cued location and a probability of (1 − ε) / 3 of being in any other location. In Experiment 2, within an experimental block, a target appears in a chosen location with probability ε. Trial structures are shown in (b) and (d).
Figure 2
 
A schematic of the Bayesian optimal observer for localization, where higher activations are represented by lighter shades of gray. Noisy observations of 4 stimulus (target or distracter) orientations are made. A target similarity map represents the likelihood of each observation being a target and the converse with a distracter similarity map. A final likelihood is computed, representing how consistent the data is with a target being in each of the locations. For it to be likely that a target is in a given location, the observation at that location must look like a target and observations at all other locations must look like distracters. These likelihoods are multiplied by the prior to result in the posterior probability (best guess) of a target being in each location given the data available. Prior expectations of where a target will occur can be manipulated in different ways.
Figure 2
 
A schematic of the Bayesian optimal observer for localization, where higher activations are represented by lighter shades of gray. Noisy observations of 4 stimulus (target or distracter) orientations are made. A target similarity map represents the likelihood of each observation being a target and the converse with a distracter similarity map. A final likelihood is computed, representing how consistent the data is with a target being in each of the locations. For it to be likely that a target is in a given location, the observation at that location must look like a target and observations at all other locations must look like distracters. These likelihoods are multiplied by the prior to result in the posterior probability (best guess) of a target being in each location given the data available. Prior expectations of where a target will occur can be manipulated in different ways.
Figure 3
 
Results for (top row) Experiment 1 and (bottom row) Experiment 2. Performance and probability of responding to the manipulated location are shown. Error bars denote 95% confidence intervals, determined by binomial distributions. Model fits are shown for the Bayesian optimal observer (black lines) and the heuristic observer (gray lines) using maximum likelihood d′ and lapse rate parameters. Reaction time advantages, relative to the neutral 25% condition, are shown; error bars denote 95% confidence intervals of the mean, calculated by a bootstrap method.
Figure 3
 
Results for (top row) Experiment 1 and (bottom row) Experiment 2. Performance and probability of responding to the manipulated location are shown. Error bars denote 95% confidence intervals, determined by binomial distributions. Model fits are shown for the Bayesian optimal observer (black lines) and the heuristic observer (gray lines) using maximum likelihood d′ and lapse rate parameters. Reaction time advantages, relative to the neutral 25% condition, are shown; error bars denote 95% confidence intervals of the mean, calculated by a bootstrap method.
Figure 4
 
Analysis of biased expectations, showing the goodness of fit (likelihood) of different levels of bias relative to the unbiased optimal observer. The optimal observer corresponds to β = 1 and is indicated by gray crosshairs. Individual participant bias analysis (middle, right) shows that maximum likelihood bias parameters range from ∼0.6 to 1.
Figure 4
 
Analysis of biased expectations, showing the goodness of fit (likelihood) of different levels of bias relative to the unbiased optimal observer. The optimal observer corresponds to β = 1 and is indicated by gray crosshairs. Individual participant bias analysis (middle, right) shows that maximum likelihood bias parameters range from ∼0.6 to 1.
Figure 5
 
Comparison of Bayesian optimal observer (black lines) and biased expectation observer (gray lines) for (top) Experiment 1 and (bottom) Experiment 2. Left and middle panels show performance and proportion of responses at the manipulated location. Right panels show the probability weighting function relating actual expectation ε to effective expectation ε′. Expectation was overweighted at low expectation values and underweighted at high expectation levels.
Figure 5
 
Comparison of Bayesian optimal observer (black lines) and biased expectation observer (gray lines) for (top) Experiment 1 and (bottom) Experiment 2. Left and middle panels show performance and proportion of responses at the manipulated location. Right panels show the probability weighting function relating actual expectation ε to effective expectation ε′. Expectation was overweighted at low expectation values and underweighted at high expectation levels.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×