While it is undoubtedly the case that high-level factors such as scene context (Henderson,
2003; Torralba,
2003) and task demands (Buswell,
1935; Land & Hayhoe,
2001; Yarbus,
1967) play an important role in natural visually guided behaviors, it is also well known that simple visual features can affect the allocation of visual attention. For example in visual search, search efficiency is dependent upon the feature dimensions used (Wolfe,
1998) and the heterogeneity of the distracters (Duncan & Humphreys,
1989,
1992; Pashler,
1987). These phenomena require an explanation: in what way are basic visual cues utilized to guide visual attention?
There has been an abundance of approaches and models applied to this problem. While not universally accepted (Allport,
1989), many approaches such as Feature Integration Theory (Treisman & Gelade,
1980) and Guided Search (Wolfe,
1994), are strongly influenced by the notions of finite processing capacity, internal selection processes, and serial deployment of attention. A key experimental phenomenon supporting this notion was the set size effect: for features not classed as ‘basic’ (Wolfe & Horowitz,
2004) increasing the number of display items took longer to search (the set size effect). Despite various advantages, studies that use long presentation times and primarily use reaction time as a dependent variable have the disadvantage that eye movements can be made during the search. Unless eye trackers are used, one could not control for, nor eliminate, their influence on reaction times. For example, it is known for many features, including oriented lines, that detectability dramatically declines with retinal eccentricity (Carrasco & Frieder,
1997) and so Zelinsky and Sheinberg (
1997) argued that inferences about internal search processes are confounded with eye movements unless these are specifically accounted for. Controlling for retinal location of stimuli, combined with the notion of detectability (e.g. Verghese & Nakayama,
1993) underlies an alternative approach.
The detection theoretic approach was applied to visual search by Palmer, Ames, and Lindsey (
1993) whose mathematical terminology we partially use here. It uses short stimulus presentation times so as to control for eye movements and records performance measures as the dependent variable, see review by Verghese (
2001). This approach has 3 main steps: 1) representation of the stimuli; 2) a combination rule which utilizes the visual information; 3) a decision rule that determines behavioral response.
This signal detection theory account which we describe below has provided close matches to human visual search performance, shedding light on a variety of empirical phenomena. These include: set size effects (Cameron, Tai, Eckstein, & Carrasco,
2004; Eckstein,
1998; Eckstein, Thomas, Palmer, & Shimozaki,
2000; Palmer,
1994; Palmer et al.,
1993) and conjunction searches where the target is defined by a combination of ‘basic’ features present in the distracters (Eckstein,
1998; Eckstein et al.,
2000). Predictions for target distracter similarity manipulations, effects of multiple targets and external noise have also been calculated (Palmer, Verghese, & Pavel,
2000). While the detection theoretic approach applied to visual search thus far stops at this decision stage, it is still a model of attentional allocation, and there is no a priori reason why it cannot be extended to longer display duration. Search performance in such detection tasks has been related to search time (Palmer,
1998), but further work in this area is needed (see Najemnik & Geisler,
2005).
Many detection theoretic applications to visual search have examined performance on a 2-TAFC (Temporal Alternative Forced Choice) task where subjects respond which of 2 successive stimulus displays contains a target. Many of these also provide at least appendices on how the approach applies to yes/no detection: responding if a target is present or absent in a single stimulus presentation. While the TAFC task has the advantage of only requiring calculation of a percent correct measure, it has the potential to inadvertently involve memory mechanisms if the delay between successive stimulus presentations is long. For example, Wilken and Ma (
2004) apply SDT to a 2-TAFC design with the specific aim of studying visual short term memory. Here, we avoid this possibility by examining visual search performance with the yes/no task.
Below, we describe the signal detection theory account and show that while it has successfully accounted for many phenomena, its assumptions limit its robustness, sometimes resulting in suboptimal predicted performance levels. Therefore, we develop a Bayes-optimal observer which maintains optimal performance in situations where the detection theoretic model does not. The major difference is that the Bayes-optimal observer makes decisions based upon maximum a posteriori probability (hence the name MAP-observer) as opposed to the maximum observation along a perceptual dimension. Our results support the notion that we are near Bayesian-optimal observers, making decisions based on a probability axis, rather than a perceptual one.