September 2013
Volume 13, Issue 3
Free
Article  |   July 2013
A clustering model for item selection in visual search
Author Affiliations
  • William H. McIlhagga
    Bradford School of Optometry and Vision Science, University of Bradford, Bradford, UK
    [email protected]
Journal of Vision July 2013, Vol.13, 20. doi:https://doi.org/10.1167/13.3.20
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      William H. McIlhagga; A clustering model for item selection in visual search. Journal of Vision 2013;13(3):20. https://doi.org/10.1167/13.3.20.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract
Abstract
Abstract:

Abstract  In visual search experiments, the subject looks for a target item in a display containing different distractor items. The reaction time (RT) to find the target is measured as a function of the number of distractors (set size). RT is either constant, or increases linearly, with set size. Here we suggest a two-stage model for search in which items are first selected and then recognized. The selection process is modeled by (a) grouping items into a hierarchical cluster tree, in which each cluster node contains a list of all the features of items in the cluster, called the object file, and (b) recursively searching the tree by comparing target features to the cluster object file to quickly determine whether the cluster could contain the target. This model is able to account for both constant and linear RT versus set size functions. In addition, it provides a simple and accurate account of conjunction searches (e.g., looking for a red N among red Os and green Ns), in particular the variation in search rate as the distractor ratio is varied.

Introduction
In a visual search experiment an observer must decide, as quickly as possible, whether a target item is present in a display, which also contains many distractor items. For example, the observer may have to find a letter X in a display that includes many randomly scattered letter Os. The target is usually present in half the displays. Performance in visual search is measured by the average reaction time (RT) to decide whether the target is present or absent, as a function of the number of items in the display (the set size). 
Two patterns of RT are observed (Neisser, 1967; Treisman & Gelade, 1980). In some search experiments, the average RT for target-present displays is constant, regardless of the set size. The average RT for target-absent displays is also constant, and about the same as that for target-present displays. Targets in these searches typically have some unique feature, such as a color, which is lacking in the distractors. Examples of constant-time searches include looking for a red target among green distractors, or looking for a Q when all the distractors are Os. 
In other search experiments, the average RT for target-present displays increases linearly with the set-size. The average RT for target-absent displays also increases linearly with set-size, and the slope of the target-absent RT line is roughly twice that of the target-present RT line. Targets in these searches are usually distinguishable from distractors by a conjunction of features, or because they lack a feature that the distractors possess. Examples are looking for a letter F when the distractors are Es, or looking for a letter O when the distractors are Qs. 
When the roles of target and distractor are reversed, the search may change from constant-time to linear-time. For example, a search for a letter Q when the distractors are Os is a constant-time search, but searching for an O when the distractors are Qs is a linear-time search. This is called search asymmetry (Treisman & Souther, 1985), and it shows that search depends on the features in the target, and not just on the difference between target and distractor. 
Many models have been proposed to account for visual search reaction times, among them feature integration theory (FIT) (Treisman & Gelade, 1980; Treisman & Gormican, 1988), theory of visual attention (TVA) (Bundesen, 1990), guided search (Wolfe, 2007; Wolfe, Cave, & Franzel, 1989), signal detection theory (SDT) (Eckstein, Thomas, Palmer, & Shimozaki, 2000; Palmer, Ames, & Lindsey, 1993; Verghese, 2001) and others (Itti & Koch, 2001; Tsotsos, 1990). Here we briefly review the first major theory (FIT) and what might be the most elaborate and accurate (guided search). 
FIT assumes that there are two mechanisms or pathways for visual search, one of which is responsible for constant-time searches, and the other for linear-time searches. Constant-time searches are due to a fast (“preattentive” or “parallel”) mechanism, which examines the features in an image and identifies uniquely occurring ones. These identified unique features lead attention directly to the target item. Linear-time searches are due to a slower (“attentive” or “serial”) mechanism, which examines items one by one until either the target is found or all the items have been examined. If there are N items in the display, the serial mechanism will have to examine N/2 items on average to find the target if it is present, but all N items must be examined to decide the target is absent. This neatly accounts for the relationship between the target-present and target-absent RT slopes for linear-time searches. These two mechanisms presumably operate simultaneously. If the fast mechanism finds the target it pre-empts the slower attentive process; otherwise, search time is due to the slower process. FIT has evolved over a number of papers to account for a range of other search phenomena. 
Guided search (Wolfe, 2007) claims that visual search is carried out in two stages: Items are first selected, and then those items are examined by a recognition process to see if they are the target or not. The selection stage is guided by the features in the search display. Items are selected one at a time on the basis of their “saliency,” with high saliency items more likely to be selected. Saliency is encoded in a 2D saliency map, which is computed rapidly and in parallel across the visual field. The selection of items is, however, essentially a serial process. Once an item is selected it is passed to a recognition process, which decides if the selected item is the target. The distinction between constant-time and linear-time searches is explained by the selection process. Unique features cause the saliency of an item to increase, so it tends to be selected first, and the number of distractors is thus irrelevant to the search. However, if no item is particularly salient, the selection process picks items essentially at random, and so search ends up linear in the set size. The precise RT distributions obtained in visual search are explained by properties of the recognition process. Unusually for visual search models, guided search assumes that items are selected with replacement (Horowitz & Wolfe, 1998): The probability of selecting an item is unaffected by the number of times the item has been selected before. 
Selection via clustering
In this paper I propose a different model for the selection stage of visual search. I assume that the visual search mechanism is a process of selection followed by recognition, much like guided search, but the selection process is based on a clustering of items in the display. The clustering begins with individual features, which are then collected into groups of features corresponding to object parts, which are then merged into groups representing objects, which are then merged into groups that are clusters of objects, which are finally merged into a single group representing the contents of the entire visual field. At every level of this hierarchy, “object files” (Wolfe & Bennett, 1997) are attached to the groups. These summarize all the features that are present in the group. These hierarchical groups and object files support searching in a way that closely mimics human visual search performance. There is no concept of salience in this selection procedure. In the Methods section below I describe the item clustering algorithm, and how the resultant hierarchical cluster tree permits efficient search for arbitrary items. In the Results section, I show how the model can explain the main findings about the average RT in visual search tasks. An earlier version of this model was presented at the European Conference on Visual Perception (ECVP) (McIlhagga, 2005). 
Methods
The model for visual search proposed here consists, like guided search, of two stages: Selection and Recognition. The Selection process finds candidate items in the display and passes them to the Recognition stage, which then decides whether the selected item is the target. The Recognition stage is a limited capacity parallel process, consisting of a few “inspection slots” where the selected items are parked for recognition. Whenever the Recognition stage has an empty slot, it requests an item from the Selection stage. The Selection stage then finds an item in the display and supplies it to Recognition, where it occupies the previously empty slot while it is being checked to see if it is the target. This process continues until the target is recognized, or the search is abandoned. 
The Selection and Recognition processes contribute to reaction time in different ways. The Selection process may be selective enough that the first item selected is the target, or it may be so unselective that all items are equally likely to be selected. These two extremes correspond to constant-time and linear-time search. It is assumed that the selection of an item is fast, and takes negligible time in comparison to the recognition of the item, so that most of the reaction time is due to the slower but more precise Recognition process. The Recognition stage is also responsible for serial correlations in reaction times (McIlhagga, 2008; Thornton & Gilden, 2005). 
The selection process
Perceptual organization and clustering
A simple and not entirely inaccurate model of V1 is to think of it as a retinotopic map of features. Each location in the retinal image is analyzed by a bank of local feature detectors, yielding a list of semisymbolic features, such as color blobs, edges (having orientation and blur characteristics), motion, depth, and so on, at different locations in the visual field (Marr, 1982). This representation is rich but unstructured—there is no explicit information about how the features relate to one another. Structure is provided by grouping these features. Some groups will represent objects or object parts, while others are simply sets of items that share some similarity with each other, or perhaps are merely close to one another. This process of grouping, or perceptual organization, follows a set of rules, some of which were noted by the Gestalt psychologists: proximity, collinearity, common fate, and similarity (Köhler, 1988). 
The complex process of perceptual grouping will be approximated here using cluster analysis (Everitt, Landau, Leese, & Stahl, 2011). In cluster analysis, data points are grouped together according to a measure of how similar the data points are. There are many different clustering algorithms. One simple algorithm is single-link, or nearest neighbor, clustering (Jardine & Sibson, 1972; Sibson, 1973). Single-link clustering begins with a set of data points or items i1, i2, … in, and a distance measure between them d(ia, ib). Single-link clusters are created as follows: Begin with a set of initial clusters C1, C2, … Cn where each cluster is just a single item: Ca = {ia}. The distance between two clusters d(Ca, Cb) is defined as the minimum distance over all pairs of items where one is drawn from the first cluster and one from the second. That is,   
At each cycle of the algorithm, the two nearest clusters are merged into a single cluster. The two clusters that are merged together are subclusters of the merged cluster. This process of cluster merging continues until only one cluster remains. 
For example, take the small set of items shown in Figure 1a. These items are all identical dots, and the distance measure here is just the Euclidean distance between the items. The algorithm starts with clusters that contain single dots. The initial clusters containing items 3 and 5 are closest, so they are the first to merge. Then the clusters containing item 1 and the {3, 5} cluster are closest, so they are merged next. This cluster{1, 3, 5} is then merged with the cluster containing item 4 to give a new cluster {1, 3, 4, 5}. Finally, this cluster is merged with the cluster containing item 2, and the clustering algorithm stops. 
Figure 1
 
(a) A set of five randomly placed points. (b) The dendrogram produced by single-link clustering of the points, using the Euclidean distance. The numbers at the base of the dendrogram refer to the points in (a). Each upside-down U shape is a cluster, so the clusters found here are the original items, and the groups {3, 5}, {1, 3, 5}, {1, 3, 5, 4}, and all the items {1, 3, 5, 4, 2}. The thick line shows the search path that would need to be taken from the top of the cluster tree to find the point numbered 3.
Figure 1
 
(a) A set of five randomly placed points. (b) The dendrogram produced by single-link clustering of the points, using the Euclidean distance. The numbers at the base of the dendrogram refer to the points in (a). Each upside-down U shape is a cluster, so the clusters found here are the original items, and the groups {3, 5}, {1, 3, 5}, {1, 3, 5, 4}, and all the items {1, 3, 5, 4, 2}. The thick line shows the search path that would need to be taken from the top of the cluster tree to find the point numbered 3.
The clusters can be displayed as a dendrogram or tree diagram (albeit more like tree roots than branches), which show how the clusters are nested within each other. The tree diagram for the above example is shown in Figure 1b. In the tree diagram, the merger of two clusters into a larger cluster is depicted by joining them with an inverted U. The height of the U is proportional to the distance between the clusters. 
Single-link cluster analysis is an imperfect model of perceptual grouping, and for general-purpose clustering it has a serious weakness (so-called “chaining”), but it has two properties that are useful: It forms hierarchical clusters, which can be modified to support efficient searching, and it has a simple bottom-up algorithm for constructing clusters, which could conceivably be implemented by a feed-forward neural net. It also yields good results in predicting how human observers group stars into constellations (Dry, Navarro, Preiss, & Lee, 2009) and has been previously suggested as a model of Gestalt grouping (Zahn, 1971). 
Single-link clustering requires a distance measure to be defined between the items. For perceptual grouping, the items are features, and the distance measure says how similar the features are. Given that the main aim of perceptual grouping is probably to find and segment objects, it is reasonable to assume that the “distance” between two features is related to the probability that the features come from the same object, so that the clusters produced are likely to be objects or object parts. We could define the distance measure between items to be   
This is zero if it is 100% certain the items belong to the same object, and increases as the probability decreases. This distance measure will depend on the physical proximity of the items, their collinearity (if they have an orientation), the similarity of color, and so on. We would generally expect this distance measure to reflect the principles of Gestalt grouping. However, for most of this paper, the only distance measure used is the Euclidean distance between items. This is a reasonable simplification because in visual search experiments, the items are usually seen as separated objects, and not parts of a whole. 
Search and selection
The cluster tree created by single-link clustering supports search using a simple depth-first algorithm. For example, if we wanted to locate item 3 in the tree shown in Figure 1b, we would begin at the top, then at each branch of the tree, successively select the subcluster that contains item 3 until we arrive at the item. This search path from the top to the bottom of the tree is shown by the thick lines in Figure 1b. The problem here is knowing which subcluster to select at each stage. This decision can be aided by augmenting each cluster in the tree with some information about the items within it. For example, in the cluster shown in Figure 1b, each cluster could be augmented by a list of all the numbers that lie inside the cluster. Then the decision about which cluster to select would be easy: Simply take the cluster whose list of numbers included 3, and proceed until you arrive at the 3. 
How should perceptual clusters be augmented to support search? Since the items in our perceptual clusters all possess features (color, orientation, contrast, etc.), every cluster could be augmented by an unstructured list of all the features possessed by all the items in the clusters. This list will be called the object file (Wolfe & Bennett, 1997), even when the cluster does not in fact represent an object. The object file can be used to speed up search. For example, if the target is red, and the object file for a cluster does not contain the feature red, we know that the target cannot be anywhere in the entire cluster, no matter how many items the cluster contains, so we can avoid searching that cluster. 
The object files of the initial clusters (which are single features) are simply the features themselves. When two clusters Ca and Cb are merged, the object file of the merged cluster is the union of the object files of Ca and Cb. An example of the object files produced when a set of lines of various angles is clustered is shown in Figure 2. Here the features are simply the line orientations. If we were searching for a vertical line (item number 3), we would start at the top of the tree and successively select whichever subcluster had a vertical line in its object file, until we arrive at the vertical line item. 
Figure 2
 
(a) A set of five green lines of different orientations scattered at random inside a square. The numbering of the lines in red identifies where they are in the dendrogram. (b) The dendrogram formed by clustering the lines using a Euclidean distance measure. Clusters are marked by small black squares. The object file for each cluster is shown next to the cluster. It is just a list of the orientations of the lines in the cluster, since they have no other feature.
Figure 2
 
(a) A set of five green lines of different orientations scattered at random inside a square. The numbering of the lines in red identifies where they are in the dendrogram. (b) The dendrogram formed by clustering the lines using a Euclidean distance measure. Clusters are marked by small black squares. The object file for each cluster is shown next to the cluster. It is just a list of the orientations of the lines in the cluster, since they have no other feature.
Selecting a more complex target is not much more difficult. For the purposes of selection, a target is specified by its object file—that is, a list of the target features. Starting at the top of the cluster tree, we select any subcluster whose object file contains the target object file. Then we search the selected subcluster using exactly the same procedure. We continue selecting subclusters until either (1) the cluster we are examining has no subclusters (i.e., is indivisible from the point of view of search, because it represents a single object) or (2) none of the subcluster object files contain the target object file. In the first case, the search returns the cluster. In the second case, we continue by selecting clusters whose object files overlap as much as possible with the target object file. 
The search process outlined above can be carried out by a simple recursive algorithm. To search a cluster C for a target T: 
  1.  
    If C's object file does not contain any of T's features, the search fails and we stop.
  2.  
    If C is an indivisible cluster, it may be the target so we pass it to the Recognition stage and stop.
  3.  
    Otherwise, find the subcluster S of C, which contains the greatest number of target features, resolving any ties randomly.
  4.  
    Search the subcluster S for target T using the same algorithm.
For example, assume we are looking for a “V” target in Figure 2a. (Note, however, that there is no “V” item in this display). This target has an object file consisting of two lines at 45° and 135°. If we search the cluster in Figure 2 for this target, then starting from the top, we would select the left subcluster, then the left subcluster again. At this point, the cluster {1, 3, 5} contains lines of the same orientation as the target, but none of its subclusters do. We could select either the cluster {1} or the cluster {3, 5} for step 3, since both overlap the target object file equally. If we select the cluster {1}, we finish the selection by returning cluster {1}. If we select {3, 5}, we must then select the subcluster {5} in step 3 and return it. 
It might be obvious to us, in the above example, that none of the clusters is the target we are looking for, but the selection process is easily fooled. In fact it is important that the selection process not exercise too much judgment. First, perceptual clustering needs to be a very fast mechanism, so it cannot also be entirely accurate. The clusters are a guide to where things are, but no more. It is quite possible that two items that ought to be clustered together are not, or that two items that should not be clustered together are (such as with accidental alignment). So selection has to return possible targets, rather than decide which items actually are targets. That more complex decision is left to the Recognition stage. 
Recognition stage
Because of the limitations of Selection, any selected cluster need to be examined closely to decide if it is the target or not. This is the task of the Recognition process. The Recognition process adopted here is described in (McIlhagga, 2008), and is very similar to the Recognition process used in guided search. 
Recognition is assumed to be a limited capacity parallel process. Limited capacity processes can be modeled either as a fixed but infinitely divisible resource, which can be allocated as needed to different tasks, or as a fixed number of specific resources, each of which has the same capacity. This model takes the latter course: The Recognition stage comprises a set of three or four “slots.” Each slot contains a single item that is provided by the Selection stage on request. Once an item is parked in the slot, it is examined by a recognizer, which is drawn at random from an active pool of recognizers. The mean recognition time for the ith recognizer is Ri, and the mean recognition time for the population of recognizers has an exponential distribution. At random times, the slowest recognizer is discarded from the active pool and returned to a dormant population. At other random times, a recognizer is selected from the dormant population and added to the active pool. This is shown in Figure 3 from (McIlhagga, 2008). This process of recognizer turnover accounts for the specific 1/f form of serial correlations between reaction times in visual search. 
Figure 3
 
Items in the search display are categorized in a number of parallel processing slots by categorizers that are drawn from an active pool. Each processing slot marries up one search item with one categorizer, and the slot clears when the categorizer finishes. After each search, some categorizers in the active pool are returned to the dormant population, and others selected from it (reproduced with permission from McIlhagga, 2008).
Figure 3
 
Items in the search display are categorized in a number of parallel processing slots by categorizers that are drawn from an active pool. Each processing slot marries up one search item with one categorizer, and the slot clears when the categorizer finishes. After each search, some categorizers in the active pool are returned to the dormant population, and others selected from it (reproduced with permission from McIlhagga, 2008).
The outline of the Recognition stage given here is simply for completeness, and it is not really used in any of the simulations given later. Instead, to keep things focused on the Selection process, I have assumed that the average RT in any search is a linear function of the average number of selections required for the Selection process to eventually select the target or to run out of things to select. Provided the number of items is much larger than the number of recognition slots, this approximation will be reasonable. 
Terminating search
Search proceeds by successively selecting and recognizing items. If the target item is present in the display, the search will terminate when the target is recognized. When the display has N items, the target will be found after an average of N/2 selections. But what if the target is absent? If selected items were somehow removed from the search so that they cannot be selected again, the search could simply terminate when no unselected items remain. When the display has N items, search will stop when all N of them have been removed from consideration. This is called serial self-terminating search (SSTS) and it neatly explains why the slope for target absent searches is twice that of the slope for target present searches. 
While simple and attractive, SSTS has some problems as an explanation of visual search termination. If, for example, a display of N items has two targets, and only one of them has to be found, then the first target will be found after an average of roughly N/3 selections. However, when the target is absent, SSTS suggests that the absence of both targets can only be discovered after N selections have been made. In this case, we would expect the target absent slope to be three times the target present slope. This is not so: It is still only about twice the target present slope, so SSTS cannot be the correct explanation for search termination with multiple targets (Ward & McClelland, 1989). 
One alternative to SSTS is memory-less search (Horowitz & Wolfe, 1998). This search retains no record of the selection of past items, and so the probability of selecting an item is unchanged throughout the search process (although different items may nonetheless have different probabilities of selection). If all items are equally likely to be selected, the target will be found after an average of N selections, rather than N/2, in a display of N items. The main issue with memory-less search is that there is no obvious reason why the target absent search rate should be twice the target present search rate. The simplest way this could occur is if search arbitrarily terminated after 2N selections. However, this would make the error rate very high. For example, in a display with 12 items, one of which is the target, and assuming all items are equally likely to be selected, the probability of missing the target in 2N selections is [1 − (1/12)]24, or around 12%. In fact, miss rates with 12 items are typically much less than that. 
Whether search has memory remains unclear (Shore & Klein, 2001). However, memory-less search greatly complicates the Selection Process developed here—to the extent that it sometimes does not work—so we assume here that once an item is selected it is removed from future searches. This can be achieved by setting its object file to empty. When this happens, all the object files of all the containing clusters must also be updated to reflect the fact that the item has been removed from the search. A further speedup can be obtained if we remove entire clusters from search (Duncan, 1995; Humphreys, Quinlan, & Jane, 1989). Suppose that during a search, a cluster object file contains the target, but none of the subcluster object files do. This happens in Figure 2 with the cluster {1, 3, 5}. Here, the cluster object file contains the target, since there are two diagonal lines at 45° and 135°, but neither subcluster {1} not {3, 5} have object files that contain the target. In this case, we can remove the entire cluster {1, 3, 5} from further search, even though we have not explored all of its subclusters. This cluster removal process means that a search, when the target is absent, can conceivably end before every item has been examined. 
The cluster and item removal process requires a change to the search procedure. The necessary changes to implement item and cluster removal are boldface in the following outline algorithm, which is otherwise the same as the previously given algorithm. To search a cluster C for a target T: 
  1.  
    If C's object file does not any of contain T's features, search fails and we stop.
  2.  
    If C is an indivisible cluster, it is a candidate target so we pass it to the Recognition stage and stop. Also, set C's object file to the empty set so it is ignored in future selections.
  3.  
    Otherwise, find the subcluster S of C, which contains the greatest number of target features, resolving any ties randomly.
  4.  
    Search the subcluster S for target T using the same algorithm.
  5.  
    After searching the subcluster, update C's object file to be the union of the subcluster object files. However, if none of the subcluster object files contain the target features, set C's object file to the empty set.
By setting a cluster object file to the empty set, we ensure that it is no longer selected in a search, since an empty object file will cause failure of search in step 1, and will not be chosen in step 3 if there is any alternative. Further, an empty object file will no longer contribute to object files in any containing clusters, via the update occurring in step 5. 
The simulated results described in the next section were generated using the following procedure: 
  1.  
    A visual search stimulus was randomly generated. The i-th item in the stimulus consisted of a position (xi, yi) on a unit square, and a vector of n features f = {zi1, zi2, … zin}, where zik was 1 if the i-th item possessed the k-th feature, and zero otherwise.
  2.  
    The items were clustered on the basis of proximity. The cluster object files were represented as feature vectors f = {z1, z2, … zn} where zi was 1 if any item in the cluster possessed feature i, and zero if none did.
  3.  
    The cluster was searched using the algorithm described immediately above. When a feature vector was set to the empty set (steps 2, 5), this was achieved by setting all the elements of the cluster's feature vector to zero.
  4.  
    The number of selections needed to find the target or run out of clusters was recorded. This was averaged over at least 1,000 repetitions with the same sample size and feature vectors to yield a measure of search speed.
Results
Search asymmetry
Search for the absence of a feature is harder than searching for its presence. For example, the search for a Q when the distractors are Os is faster than searching for an O when the distractors are Qs. In fact, the search for a Q is a constant-time search, whereas the search for the O is a linear time search. This is called search asymmetry (Treisman & Souther, 1985). 
How does the cluster search model explain this result? In the case of looking for a Q among a set of Os, or vice versa, assume we have only two features, an “O” shape and an oblique line. A Q is a combination of these features. A cluster tree created from a stimulus containing one Q and many Os is shown in Figure 4. The feature distance between items, used to form the clusters, is simply the Euclidean distance. At every stage of the search from the topmost cluster to the bottom, there is only one subcluster which has an object file containing both the O and the oblique line feature. The other subcluster lacks the line feature and so is never selected. Therefore, when the target is present, it is always selected first. 
Figure 4
 
Search for a Q (or lollipop shape) when distractors are Os. (a) The stimulus consists of seven items: one Q and several Os. The Q has two features: a circle and a line. The Os have only one feature: a circle. (b) The dendrogram formed from the stimulus. The object file of clusters containing the Q will always have a line in them, whereas object files for clusters that do not contain the target will lack the line feature. The only possible search path is shown by the thick lines.
Figure 4
 
Search for a Q (or lollipop shape) when distractors are Os. (a) The stimulus consists of seven items: one Q and several Os. The Q has two features: a circle and a line. The Os have only one feature: a circle. (b) The dendrogram formed from the stimulus. The object file of clusters containing the Q will always have a line in them, whereas object files for clusters that do not contain the target will lack the line feature. The only possible search path is shown by the thick lines.
When the target is absent the object file of the topmost cluster does not contain an oblique line. At that point it is clear that there is no target in the display, and search can terminate immediately without selecting anything. Thus, whether the target is present, there is, at most, a single selection performed on this cluster tree, after which the target is either found or selection fails, regardless of the number of distractors. 
Now consider looking for an O when the distractors are Qs. A cluster tree formed from a display containing one O target and many Q distractors is shown in Figure 5. Here, the object file for the O target is a single O, which is contained in the object file for almost all the clusters. Until one gets down to the lowest levels of the cluster tree, the object files of the clusters do not provide any information to guide selection, so the search algorithm degenerates to randomly selecting clusters for examination. When the target is present in a display of N items, we will have to make on average N/2 selections to accidentally stumble upon the cluster containing the target. When the target is absent, search will terminate when all items have been selected, which happens after N selections. 
Figure 5
 
Search for an O among Q distractors. (a) The stimulus. The items here are in exactly the same place as in Figure 4a, but Os have been swapped for Qs and vice versa. (b) The dendrogram created from the stimulus. All the upper level clusters have the same object file, containing a circle and a line. Since they all contain the circle, which is the target feature, there is nothing to guide our selection of subclusters, and so they are searched at random.
Figure 5
 
Search for an O among Q distractors. (a) The stimulus. The items here are in exactly the same place as in Figure 4a, but Os have been swapped for Qs and vice versa. (b) The dendrogram created from the stimulus. All the upper level clusters have the same object file, containing a circle and a line. Since they all contain the circle, which is the target feature, there is nothing to guide our selection of subclusters, and so they are searched at random.
The Selection model presented here explains these two kinds of search with the same mechanism. This is not the same as saying that the two kinds of search, constant and linear time, are really just extremes of a continuum (Wolfe, 1998): this Selection process behaves in qualitatively different ways depending on whether the target has a unique feature or not. Finally, Figure 6 shows simulations giving the average number of selections for target present and target absent searches for both Q among O and O among Q searches. The pattern of results shown here replicates the pattern of results shown by humans in search asymmetry experiments. 
Figure 6
 
Simulated results for search asymmetry. This graph plots the mean number of selections needed to find the target (solid line), or to conclude the target does not exist (dashed line). The RT will be proportional to the number of selections. Circle symbols are search for an O when the distractors are Qs; square symbols are search for a Q when the distractors are Os. Each point is the average of 1,000 replications.
Figure 6
 
Simulated results for search asymmetry. This graph plots the mean number of selections needed to find the target (solid line), or to conclude the target does not exist (dashed line). The RT will be proportional to the number of selections. Circle symbols are search for an O when the distractors are Qs; square symbols are search for a Q when the distractors are Os. Each point is the average of 1,000 replications.
Conjunction search
There are two kinds of searches commonly called conjunction searches. In one kind, all items have the same set of features, but they are arranged differently. An example would be looking for an upside-down L when the distractors are all right-side-up Ls. In this kind of search, it is the exact configuration of features that determines whether the item is a target or a distractor. Another kind of conjunction search is when the target has two specific features, but distractors have only one (Treisman & Sato, 1990). An example of this second kind of conjunction search is looking for a target that is a red vertical line when the distractors are green vertical lines or red horizontal lines (Poisson & Wilkinson, 1992). Here I will only consider the second kind of conjunction search, because the model does not say anything particularly interesting about configuration searches: They are simply linear-time searches because the cluster object files are uninformative about the shape of the items within. 
Conjunction searches tend to be faster than you would expect from a serial search model, and the speed of search depends on the relative proportion of distractors, getting faster as one kind of distractor gets more rare (Poisson & Wilkinson, 1992). This pattern of results is readily explained by the clustering model. Consider a conjunction search for a red vertical line when the distractors are red horizontal lines or green vertical lines. A cluster tree created by such a display is shown in Figure 7. The distance measure used here is again simply Euclidean distance, so there are no color or orientation grouping effects. However, there can be clumps of nearby items which are of similar color (such as a set of nearby green vertical lines) that end up clustered together. Clearly, when searching for a red vertical line, any clump of purely green items or purely horizontal items will be avoided if possible. 
Figure 7
 
An example of a conjunction search. (a) The target is the red vertical line (item 1) and the distractors are red horizontal lines and green vertical lines. (b) The dendrogram of this search display. The features in the object files are red color, green color, vertical, and horizontal. Some clusters (like {6, 7, 8}) will never be selected because they lack the red feature. Other clusters (like {3, 4, 5}) will be selected because they have the required features, even though none of the subclusters do. Any item returned from the cluster {3, 4, 5} will, however, be rejected in the recognition stage.
Figure 7
 
An example of a conjunction search. (a) The target is the red vertical line (item 1) and the distractors are red horizontal lines and green vertical lines. (b) The dendrogram of this search display. The features in the object files are red color, green color, vertical, and horizontal. Some clusters (like {6, 7, 8}) will never be selected because they lack the red feature. Other clusters (like {3, 4, 5}) will be selected because they have the required features, even though none of the subclusters do. Any item returned from the cluster {3, 4, 5} will, however, be rejected in the recognition stage.
Poisson and Wilkinson (1992) looked at conjunction searches using a fixed set size of 25 items. The target was usually a red vertical line. Distractors are either red horizontal or green vertical lines. The distractor ratio was varied from 2:23 red:green to 23:2 red:green. They found that the slowest target-present searches occurred when the distractor ratio was balanced, and search was faster for unbalanced distractor ratios. Figure 8a shows target present and absent RT data from Poisson and Wilkinson (1992), together with simulated RTs from the model. The results of the simulation accord quite well with the data for target-present searches. The inverted-U shape of the model RTs occurs because when the search display has few red items, the red feature of the target guides selections through the cluster tree; when the search display has few vertical items, it is the vertical feature which acts as the guide. Note that the “guidance” here is automatic and there is no need for the observer to choose which feature to use. Instead, the cluster model automatically exploits the least common feature of the target to guide the search, without even knowing that it is the least common feature (Figure 9). 
Figure 8
 
Comparison of model simulations to conjunction search RT data collected shown in Figure 2c of (Poisson & Wilkinson, 1992). Filled circles are average RTs to search for red vertical line target among red horizontal lines and green vertical lines, when the target is present. Filled squares are mean RTs when the target is absent. Number of red horizontal distractors varies along the x axis. Number of green vertical distractors is 25 minus the number of red distractors. (a) The solid line is the average number of selections made by the model when the target is present, scaled to match human data [model RT = 513 + 75× (number of selections)]. The dashed line is the average number of selections made by the model when the target was absent, scaled using the same values as target-present. Here the model always selected clusters with object files containing both features “red” and “vertical” (b) If the model selects clusters with object files containing only the feature “red” on 40% of occasions, then the model is able to account for some of the bias in the target absent RTs (dashed line shows model prediction for target absent searches). The model data was scaled to match human target present RTs [model RT = 506 + 56 × (number of selections)].
Figure 8
 
Comparison of model simulations to conjunction search RT data collected shown in Figure 2c of (Poisson & Wilkinson, 1992). Filled circles are average RTs to search for red vertical line target among red horizontal lines and green vertical lines, when the target is present. Filled squares are mean RTs when the target is absent. Number of red horizontal distractors varies along the x axis. Number of green vertical distractors is 25 minus the number of red distractors. (a) The solid line is the average number of selections made by the model when the target is present, scaled to match human data [model RT = 513 + 75× (number of selections)]. The dashed line is the average number of selections made by the model when the target was absent, scaled using the same values as target-present. Here the model always selected clusters with object files containing both features “red” and “vertical” (b) If the model selects clusters with object files containing only the feature “red” on 40% of occasions, then the model is able to account for some of the bias in the target absent RTs (dashed line shows model prediction for target absent searches). The model data was scaled to match human target present RTs [model RT = 506 + 56 × (number of selections)].
Figure 9
 
Cluster trees when the distractor ratio is unbalanced. (a) Search for a red vertical line when there are only two red distractors. The thick lines show the only two possible selection paths through the tree. If the selection process finds itself in cluster {3, 4, 5}, it will pick one of the items in the cluster which has the greatest overlap with the target object file. The target has a 50% chance of being detected on the first selection. (b) Search for a red vertical line when there are only two vertical distractors. The thick lines show the only two possible selection paths through the tree. If the selection process finds itself in cluster {6, 7, 8} it will again return one of the nontarget items within that cluster. As in (a), the target has a 50% chance of being detected on the first selection.
Figure 9
 
Cluster trees when the distractor ratio is unbalanced. (a) Search for a red vertical line when there are only two red distractors. The thick lines show the only two possible selection paths through the tree. If the selection process finds itself in cluster {3, 4, 5}, it will pick one of the items in the cluster which has the greatest overlap with the target object file. The target has a 50% chance of being detected on the first selection. (b) Search for a red vertical line when there are only two vertical distractors. The thick lines show the only two possible selection paths through the tree. If the selection process finds itself in cluster {6, 7, 8} it will again return one of the nontarget items within that cluster. As in (a), the target has a 50% chance of being detected on the first selection.
The model predictions for target present search match human performance quite well, but the results for target absent searches do not match human performance, as it seems that human observers had a bias of some kind. So long as the model always selects clusters whose object files contain the features “red” and “vertical,” it will never produce a bias like that shown by humans. However, if sometimes the selection process assumes that the target is merely red, rather than red and vertical, this will introduce a bias in reaction time, since when there are more red distractors, they will be selected more often than is optimal. If we assume that the selection process looks for just a red item (rather than a red and vertical item) on 40% of occasions, then the model produces a bias more like that shown by humans (Figure 8b). 
Finally, some accounts of conjunction search suggest that the search mechanism groups items on the basis of one feature, say color, and then searches among the groups for items with the other feature (Bacon & Egeth, 1997; Grossberg, Mingolla, & Ross, 1994; Kaptein, Theeuwes, & van der Heijden, 1995). This was not assumed here—items were clustered purely on the basis of proximity. Indeed, if we were to assume a strong grouping on the basis of color, we would expect the conjunction target to pop out. This is because, within any group of red items, the target has a unique feature (vertical) that the other red items lack. Any color or feature-based grouping that dominates over proximity-based grouping is likely to generate extremely fast reaction times in conjunction searches. This suggests that color (or any other feature) does not, in fact, dominate the grouping of items in conjunction searches. 
Triple conjunctions
A triple conjunction search is where the target is defined by the conjunction of three features. There are two kinds of triple conjunction searches: where the distractors have any two of the target features, or where the distractors have only one of the target features. The second kind of conjunction search is extremely rapid (Wolfe et al., 1989), and the target is found in constant time regardless of the number of distractors. 
The cluster search model produces results, which are broadly in line with this finding (Figure 10). Three cases were simulated: a double conjunction search (similar to the previous section, where the target has an object file containing two features {A , B}, while distractors have {A} or {B}), a hard triple conjunction search (where the target has three features {A, B, C}, and the distractors have either {A, B}, {A, C} or {B, C}), and an easy triple conjunction search (where the target again has features {A, B, C}, but the distractors have only {A} or {B} or {C}). The simulated results for target present searches are shown in Figure 10a, and for target absent searches in Figure 10b. The simulations were scaled to fit the target present searches using the equation mean RT = 505 + 52 × (number of selections). This scaling was also used for the target absent searches. The reason the easy triple conjunction search turns out to be so easy is simply that there are hardly any clusters with object files equal to {A, B, C} that do not also contain the target. 
Figure 10
 
Simulation of triple conjunction searches from (Wolfe et al., 1989) (a) Model predictions (line) and human data (filled circles) for target present searches, scaled according to the equation mean RT = 505 + 52 × (number of selections). Note that the time per selection (52 ms) is similar to the time per selection in Figure 8b. The model broadly replicates the features of human search: Hard triple conjunction is easier than double conjunction; and easy triple conjunction has a constant mean RT. (b) Predictions (lines) and human data (squares) for target absent searches, using the same scaling as in (a).
Figure 10
 
Simulation of triple conjunction searches from (Wolfe et al., 1989) (a) Model predictions (line) and human data (filled circles) for target present searches, scaled according to the equation mean RT = 505 + 52 × (number of selections). Note that the time per selection (52 ms) is similar to the time per selection in Figure 8b. The model broadly replicates the features of human search: Hard triple conjunction is easier than double conjunction; and easy triple conjunction has a constant mean RT. (b) Predictions (lines) and human data (squares) for target absent searches, using the same scaling as in (a).
Grouping
The spatial grouping of items in a search display can radically alter the speed of search (Bundesen & Pedersen, 1983; Kim & Cave, 1999; Olds, Cowan, & Joliœur, 1999; Poisson & Wilkinson, 1992; Treisman, 1982) . For example, the search for an O when the distractors are Qs is typically slow, but if all the Q distractors are grouped in one place and the O somewhere else, the search is much faster. Similarly, in the conjunction search discussed in the previous section, if all the same colored items are grouped together, the search becomes much easier (see Figure 11). 
Figure 11
 
Effects of spatial grouping on visual search. Search becomes much easier if the locations of items create homogenous clusters. These homogenous clusters have object files which are sufficient to identify the target unambiguously (the single O on the left) or reject entire clusters (the green group on the right).
Figure 11
 
Effects of spatial grouping on visual search. Search becomes much easier if the locations of items create homogenous clusters. These homogenous clusters have object files which are sufficient to identify the target unambiguously (the single O on the left) or reject entire clusters (the green group on the right).
The Selection model presented here explains both these speeded searches due to grouping quite easily. For the search display shown in Figure 11a, there are two top-level clusters, one containing the O and one containing all the Qs. The search procedure thus has a 50:50 chance of selecting the target, rather than the much lower chance that would obtain if the items were jumbled together. Explaining the speedup of search in a display like Figure 11b is similarly straightforward. The display breaks naturally into two clusters, but only one of the clusters has an object file that contains the target object color (in this case, the red cluster on the left). Within that cluster, there is always only one subcluster whose object file contains the vertical line (as in constant-time search) so the search is very easy. 
Discussion
This paper describes a model for item selection in visual search, which proposes that the visual system first clusters the items in the display, primarily on distance, and then searches those clusters for the target item. The search is accelerated by attaching “object files” to each cluster, which list the various features possessed by all the items in the cluster. Only those clusters whose object files contain the target will be searched. This model can account for search asymmetry and the search rate for conjunction search. It is better at accounting for target-present RTs than target-absent RTs. 
The idea that search might be influenced by clustering of items has been previously proposed by many others (Bundesen & Pedersen, 1983; Duncan, 1995; Duncan & Humphreys, 1989; Humphreys et al., 1989; Kim & Cave, 1999; Poisson & Wilkinson, 1992; Treisman, 1982). The main contribution of the model proposed here is defining exactly how the clustering of items affects search (namely via the object files attached to the clusters) and specifying a simple algorithm that does a good job of replicating human performance over a range of different search experiments. 
Similarly, the hierarchical nature of the clusters is not new. Cluster analysis is, of course, a standard statistical procedure. The search procedure is similar to some algorithms for searching high-dimensional feature spaces (Brin, 1995; Hjaltason & Samet, 2003) although these are over partitioning trees rather than clusters. One prominent search model that uses a hierarchical architecture is that proposed by Tsotsos (Tsotsos, 1990; Tsotsos et al., 1995). However, the search procedure in the current model relies on the object files that are attached to the clusters, and entire clusters are rejected or inhibited based on the object files. The Tsotsos model is instead aimed at efficiently finding the maximum of a saliency measure over the image. The advantage of the Tsotsos model compared to the model presented here is that it is specified as a neural net and can be used on raw image data. However, the Tsotsos model does not seem to have been used to predict RT versus set size data over a range of different experiments. In addition, the Tsotsos model does not use object files. FACADE (Grossberg et al., 1994) is another theory for visual search that proposes that items be organized into a hierarchy of groups. In FACADE, the items are partitioned into groups based on a commonality of features, and those groups are then selected (or ignored) and searched in a serial manner. The FACADE model appears to be a roundabout way of clustering items based on proximity and features. FACADE also lacks a concept of object files. 
One popular idea in search that this model does not use is salience. Salience is a mapping from feature space into the real line; items with more salience are more likely to be selected than items that have less salience. The cluster search model can generate good fits to RT data without referring to salience at all. This is not to say, however, that some things do not irresistibly grab our attention, but rather that this grabbing of attention might be a different phenomenon from efficient (i.e., constant-time) search. 
The cluster algorithm depends on comparing target features to the cluster object files. The object files treat features as if they were either present or absent. However, features are often present to some degree. For example, a red item may have features that are qualitatively different from a green item, but they are only quantitatively different from a more intense red item. If object files record the degree to which a feature is present, rather than merely presence or absence, a different procedure for comparing target and cluster object files is needed. This could be implemented as follows. Let f = {z1, z2, … zn} be an object file, where zi represents the intensity or amplitude of feature i. We combine two object files by taking the maximum of each feature intensity. That is, the object file for the merger of clusters Ca and Cb is {max(z1a, z1b), max(z2a, z2b)} where zia and zib are the intensities of feature i in respectively clusters Ca and Cb. A target might be in a cluster if all of its feature intensities were less than or equal to the corresponding intensities in the cluster object file. 
This change would allow the model to deal with searches where the target is more or less intense along some feature dimension, e.g., redness. It suggests that a more intense target would pop out relative to less intense distractors, but not vice-versa. However, feature intensities would, inevitably, be represented with error. Thus the decision about whether a target might be in a cluster is a form of intensity discrimination task. The theory behind such tasks is signal detection theory, and this suggests that many of the ideas in SDT that have been applied to visual search (Eckstein et al., 2000; Palmer et al., 1993; Verghese, 2001; Vincent, 2011) could, in the future, be integrated into a cluster search procedure. 
To conclude, this model is a good but incomplete model for visual search. It is primarily about the process that might drive the selection of items in search, and the other aspects of the model are sketchy or nonexistent. Thus there are a number of things the model does not do well. First, it makes no errors. Given that humans make relatively few errors under most search conditions, this is a problem but not a fatal one. It is possible that weakening some of the assumptions underlying search termination would introduce a realistic level of miss errors, but there is no obvious mechanism for false alarms. Second, the distribution of reaction times is poorly specified. The model is good at predicting average reaction times, but the overall distribution contains important information about search processes (Wolfe, 1998). Third, although the clustering process in the model relies on a measure of distance between items, the measure is incompletely specified. Luckily, in this case, the Euclidean distance seems to be the dominant aspect of the true distance measure, so using the Euclidean distance yields realistic results. More interesting results could be obtained by implementing a distance measure based upon, say, edge co-occurrence (Geisler, Perry, Super, & Gallogly, 2001), and which could lead to a unification of search models with models of perceptual organization. Finally, the model, like many others, assumes a set of features have been defined and extracted from the image prior to commencing. To some extent, this is not a huge problem. Regardless of the features, the model does predict search asymmetry and the effects of feature conjunction. However, the occurrence of search asymmetry or feature conjunction cannot be predicted from a raw image until the exact features are specified in enough detail that they can be extracted from the image. This is a promising avenue for future research. 
Acknowledgments
Commercial relationships: none. 
Corresponding author: William H. McIlhagga. 
Address: Bradford School of Optometry and Vision Science, University of Bradford, Bradford, UK. 
References
Bacon W. F. Egeth H. E. (1997). Goal-directed guidance of attention: Evidence from conjunctive visual search. Journal of Experimental Psychology: Human Perception and Performance, 23 (4), 948–961. [CrossRef] [PubMed]
Brin S. (1995). Near Neighbor Search in Large Metric Spaces. Proceedings of the 21st VLDB Conference. Zurich, Switzerland. Retrieved December 30, 2012, from http://ilpubs.stanford.edu:8090/113/
Bundesen C. (1990). A theory of visual attention. Psychological Review, 97 (4), 523–547. [CrossRef] [PubMed]
Bundesen C. Pedersen L. F. (1983). Color segregation and visual search. Perception & Psychophysics, 33 (5), 487–493. [CrossRef] [PubMed]
Dry M. J. Navarro D. J. Preiss A. K. Lee M. D. (2009). The perceptual organization of point constellations. Retrieved December 31, 2012, from http://digital.library.adelaide.edu.au/dspace/handle/2440/58414.
Duncan J. (1995). Target and nontarget grouping in visual search. Perception & Psychophysics, 57 (1), 117–120. [CrossRef] [PubMed]
Duncan J. Humphreys G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96 (3), 433–458. [CrossRef] [PubMed]
Eckstein M. P. Thomas J. P. Palmer J. Shimozaki S. S. (2000). A signal detection model predicts the effects of set size on visual search accuracy for feature, conjunction, triple conjunction, and disjunction displays. Perception & Psychophysics, 62 (3), 425–451. [CrossRef] [PubMed]
Everitt B. S. Landau S. Leese M. Stahl D. (2011). Cluster analysis (5th ed.). Hoboken, NJ: Wiley.
Geisler W. S. Perry J. S. Super B. J. Gallogly D. P. (2001). Edge co-occurrence in natural images predicts contour grouping performance. Vision Research, 41, 711–724. [CrossRef] [PubMed]
Grossberg S. Mingolla E. Ross W. D. (1994). A neural theory of attentive visual search: Interactions of boundary, surface, spatial, and object representations. Psychological Review, 101 (3), 470–489. [CrossRef] [PubMed]
Hjaltason G. R. Samet H. (2003). Index-driven similarity search in metric spaces (Survey article). ACM Transactions on Database Systems, 28 (4), 517–580. [CrossRef]
Horowitz T. S. Wolfe J. M. (1998). Visual search has no memory. Nature, 394 (6693), 575–577. [CrossRef] [PubMed]
Humphreys G. W. Quinlan P. T. Jane M. (1989). Grouping processes in visual search: Effects with single- and combined-feature targets. Journal of Experimental Psychology: General, 118 (3), 258–279. [CrossRef] [PubMed]
Itti L. Koch C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2 (3), 194–203. [CrossRef] [PubMed]
Jardine N. Sibson R. (1972). Mathematical taxonomy. Hoboken, NJ: John Wiley & Sons.
Kaptein N. A. Theeuwes J. van der Heijden A. H. C. (1995). Search for a conjunctively defined target can be selectively limited to a color-defined subset of elements. Journal of Experimental Psychology: Human Perception and Performance, 21 (5), 1053–1069. [CrossRef]
Kim M. S. Cave K. R. (1999). Grouping effects on spatial attention in visual search. The Journal of General Psychology, 126 (4), 326–352. [CrossRef] [PubMed]
Köhler W. (1988). Gestalt psychology. A history of psychology: Original sources and contemporary research. (pp. 520–527). New York, NY: McGraw-Hill Book Company.
Marr D. (1982). Vision: A computational investigation into the human representation and processing of visual information. Cambridge, MA: MIT Press.
McIlhagga W. (2005). A fast heuristic algorithm for human visual search. Perception, 34 (Supplement), 60.
McIlhagga W. (2008). Serial correlations and 1/f power spectra in visual search reaction times. Journal of Vision, 8 (9): 5, 1–14, http://www.journalofvision.org/content/8/9/5, doi:10.1167/8.9.5. [PubMed] [Article] [CrossRef] [PubMed]
Neisser U. (1967). Cognitive psychology. New York: Appleton-Century-Crofts.
Olds E. S. Cowan W. B. Joliœur P. (1999). Spatial organization of distractors in visual search. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 53 (2), 150–159. [CrossRef]
Palmer J. Ames C. T. Lindsey D. T. (1993). Measuring the effect of attention on simple visual search. Journal of Experimental Psychology: Human Perception and Performance, 19 (1), 108–130. [CrossRef] [PubMed]
Poisson M. E. Wilkinson F. (1992). Distractor ratio and grouping processes in visual conjunction search. Perception, 21 (1), 21–38. [CrossRef] [PubMed]
Shore D. I. Klein R. M. (2001). On the manifestations of memory in visual search. Spatial Vision, 14 (1), 59–75. [CrossRef]
Sibson R. (1973). SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16 (1), 30–34. [CrossRef]
Thornton T. L. Gilden D. L. (2005). Provenance of correlations in psychological data. Psychonomic Bulletin & Review, 12 (3), 409–441. [CrossRef] [PubMed]
Treisman A. (1982). Perceptual grouping and attention in visual search for features and for objects. Journal of Experimental Psychology: Human Perception and Performance, 8 (2), 194–214. [CrossRef] [PubMed]
Treisman A. Gelade G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12 (1), 97–136. [CrossRef] [PubMed]
Treisman A. Gormican S. (1988). Feature analysis in early vision: Evidence from search asymmetries. Psychological Review, 95 (1), 15–48. [CrossRef] [PubMed]
Treisman A. Sato S. (1990). Conjunction search revisited. Journal of Experimental Psychology: Human Perception and Performance, 16 (3), 459–478. [CrossRef] [PubMed]
Treisman A. Souther J. (1985). Search asymmetry: A diagnostic for preattentive processing of separable features. Journal of Experimental Psychology: General, 114 (3), 285–310. [CrossRef] [PubMed]
Tsotsos J. K. (1990). Analyzing vision at the complexity level. Behavioral and Brain Sciences, 13 (03), 423–445. [CrossRef]
Tsotsos J. K. Culhane S. M. Kei Wai W. Y. Lai Y. Davis N. (1995). Modeling visual attention via selective tuning. Artificial Intelligence, 78 (1–2), 507–545. [CrossRef]
Verghese P. (2001). Visual search and attention: A signal detection theory approach. Neuron, 31 (4), 523–535. [CrossRef] [PubMed]
Vincent B. T. (2011). Search asymmetries: Parallel processing of uncertain sensory information. Vision Research, 51 (15), 1741–1750. [CrossRef] [PubMed]
Ward R. McClelland J. L. (1989). Conjunctive search for one and two identical targets. Journal of Experimental Psychology: Human Perception and Performance, 15 (4), 664–672. [CrossRef] [PubMed]
Wolfe J. M. (1998). What can 1 million trials tell us about visual search? Psychological Science, 9 (1), 33–39. [CrossRef]
Wolfe J. M. (2007). Guided Search 4.0: Current progress with a model of visual search. In Gray W. (Ed.), Integrated Models of Cognitive Systems. (pp. 99–119). New York: Oxford.
Wolfe J. M. Bennett S. C. (1997). Preattentive object files: Shapeless bundles of basic features. Vision Research, 37 (1), 25–43. [CrossRef] [PubMed]
Wolfe J. M. Cave K. R. Franzel S. L. (1989). Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15 (3), 419–433. [CrossRef] [PubMed]
Zahn C. T. (1971). Graph-theoretical methods for detecting and describing Gestalt clusters. IEEE Transactions on Computers, C-20 (1), 68–86. [CrossRef]
Figure 1
 
(a) A set of five randomly placed points. (b) The dendrogram produced by single-link clustering of the points, using the Euclidean distance. The numbers at the base of the dendrogram refer to the points in (a). Each upside-down U shape is a cluster, so the clusters found here are the original items, and the groups {3, 5}, {1, 3, 5}, {1, 3, 5, 4}, and all the items {1, 3, 5, 4, 2}. The thick line shows the search path that would need to be taken from the top of the cluster tree to find the point numbered 3.
Figure 1
 
(a) A set of five randomly placed points. (b) The dendrogram produced by single-link clustering of the points, using the Euclidean distance. The numbers at the base of the dendrogram refer to the points in (a). Each upside-down U shape is a cluster, so the clusters found here are the original items, and the groups {3, 5}, {1, 3, 5}, {1, 3, 5, 4}, and all the items {1, 3, 5, 4, 2}. The thick line shows the search path that would need to be taken from the top of the cluster tree to find the point numbered 3.
Figure 2
 
(a) A set of five green lines of different orientations scattered at random inside a square. The numbering of the lines in red identifies where they are in the dendrogram. (b) The dendrogram formed by clustering the lines using a Euclidean distance measure. Clusters are marked by small black squares. The object file for each cluster is shown next to the cluster. It is just a list of the orientations of the lines in the cluster, since they have no other feature.
Figure 2
 
(a) A set of five green lines of different orientations scattered at random inside a square. The numbering of the lines in red identifies where they are in the dendrogram. (b) The dendrogram formed by clustering the lines using a Euclidean distance measure. Clusters are marked by small black squares. The object file for each cluster is shown next to the cluster. It is just a list of the orientations of the lines in the cluster, since they have no other feature.
Figure 3
 
Items in the search display are categorized in a number of parallel processing slots by categorizers that are drawn from an active pool. Each processing slot marries up one search item with one categorizer, and the slot clears when the categorizer finishes. After each search, some categorizers in the active pool are returned to the dormant population, and others selected from it (reproduced with permission from McIlhagga, 2008).
Figure 3
 
Items in the search display are categorized in a number of parallel processing slots by categorizers that are drawn from an active pool. Each processing slot marries up one search item with one categorizer, and the slot clears when the categorizer finishes. After each search, some categorizers in the active pool are returned to the dormant population, and others selected from it (reproduced with permission from McIlhagga, 2008).
Figure 4
 
Search for a Q (or lollipop shape) when distractors are Os. (a) The stimulus consists of seven items: one Q and several Os. The Q has two features: a circle and a line. The Os have only one feature: a circle. (b) The dendrogram formed from the stimulus. The object file of clusters containing the Q will always have a line in them, whereas object files for clusters that do not contain the target will lack the line feature. The only possible search path is shown by the thick lines.
Figure 4
 
Search for a Q (or lollipop shape) when distractors are Os. (a) The stimulus consists of seven items: one Q and several Os. The Q has two features: a circle and a line. The Os have only one feature: a circle. (b) The dendrogram formed from the stimulus. The object file of clusters containing the Q will always have a line in them, whereas object files for clusters that do not contain the target will lack the line feature. The only possible search path is shown by the thick lines.
Figure 5
 
Search for an O among Q distractors. (a) The stimulus. The items here are in exactly the same place as in Figure 4a, but Os have been swapped for Qs and vice versa. (b) The dendrogram created from the stimulus. All the upper level clusters have the same object file, containing a circle and a line. Since they all contain the circle, which is the target feature, there is nothing to guide our selection of subclusters, and so they are searched at random.
Figure 5
 
Search for an O among Q distractors. (a) The stimulus. The items here are in exactly the same place as in Figure 4a, but Os have been swapped for Qs and vice versa. (b) The dendrogram created from the stimulus. All the upper level clusters have the same object file, containing a circle and a line. Since they all contain the circle, which is the target feature, there is nothing to guide our selection of subclusters, and so they are searched at random.
Figure 6
 
Simulated results for search asymmetry. This graph plots the mean number of selections needed to find the target (solid line), or to conclude the target does not exist (dashed line). The RT will be proportional to the number of selections. Circle symbols are search for an O when the distractors are Qs; square symbols are search for a Q when the distractors are Os. Each point is the average of 1,000 replications.
Figure 6
 
Simulated results for search asymmetry. This graph plots the mean number of selections needed to find the target (solid line), or to conclude the target does not exist (dashed line). The RT will be proportional to the number of selections. Circle symbols are search for an O when the distractors are Qs; square symbols are search for a Q when the distractors are Os. Each point is the average of 1,000 replications.
Figure 7
 
An example of a conjunction search. (a) The target is the red vertical line (item 1) and the distractors are red horizontal lines and green vertical lines. (b) The dendrogram of this search display. The features in the object files are red color, green color, vertical, and horizontal. Some clusters (like {6, 7, 8}) will never be selected because they lack the red feature. Other clusters (like {3, 4, 5}) will be selected because they have the required features, even though none of the subclusters do. Any item returned from the cluster {3, 4, 5} will, however, be rejected in the recognition stage.
Figure 7
 
An example of a conjunction search. (a) The target is the red vertical line (item 1) and the distractors are red horizontal lines and green vertical lines. (b) The dendrogram of this search display. The features in the object files are red color, green color, vertical, and horizontal. Some clusters (like {6, 7, 8}) will never be selected because they lack the red feature. Other clusters (like {3, 4, 5}) will be selected because they have the required features, even though none of the subclusters do. Any item returned from the cluster {3, 4, 5} will, however, be rejected in the recognition stage.
Figure 8
 
Comparison of model simulations to conjunction search RT data collected shown in Figure 2c of (Poisson & Wilkinson, 1992). Filled circles are average RTs to search for red vertical line target among red horizontal lines and green vertical lines, when the target is present. Filled squares are mean RTs when the target is absent. Number of red horizontal distractors varies along the x axis. Number of green vertical distractors is 25 minus the number of red distractors. (a) The solid line is the average number of selections made by the model when the target is present, scaled to match human data [model RT = 513 + 75× (number of selections)]. The dashed line is the average number of selections made by the model when the target was absent, scaled using the same values as target-present. Here the model always selected clusters with object files containing both features “red” and “vertical” (b) If the model selects clusters with object files containing only the feature “red” on 40% of occasions, then the model is able to account for some of the bias in the target absent RTs (dashed line shows model prediction for target absent searches). The model data was scaled to match human target present RTs [model RT = 506 + 56 × (number of selections)].
Figure 8
 
Comparison of model simulations to conjunction search RT data collected shown in Figure 2c of (Poisson & Wilkinson, 1992). Filled circles are average RTs to search for red vertical line target among red horizontal lines and green vertical lines, when the target is present. Filled squares are mean RTs when the target is absent. Number of red horizontal distractors varies along the x axis. Number of green vertical distractors is 25 minus the number of red distractors. (a) The solid line is the average number of selections made by the model when the target is present, scaled to match human data [model RT = 513 + 75× (number of selections)]. The dashed line is the average number of selections made by the model when the target was absent, scaled using the same values as target-present. Here the model always selected clusters with object files containing both features “red” and “vertical” (b) If the model selects clusters with object files containing only the feature “red” on 40% of occasions, then the model is able to account for some of the bias in the target absent RTs (dashed line shows model prediction for target absent searches). The model data was scaled to match human target present RTs [model RT = 506 + 56 × (number of selections)].
Figure 9
 
Cluster trees when the distractor ratio is unbalanced. (a) Search for a red vertical line when there are only two red distractors. The thick lines show the only two possible selection paths through the tree. If the selection process finds itself in cluster {3, 4, 5}, it will pick one of the items in the cluster which has the greatest overlap with the target object file. The target has a 50% chance of being detected on the first selection. (b) Search for a red vertical line when there are only two vertical distractors. The thick lines show the only two possible selection paths through the tree. If the selection process finds itself in cluster {6, 7, 8} it will again return one of the nontarget items within that cluster. As in (a), the target has a 50% chance of being detected on the first selection.
Figure 9
 
Cluster trees when the distractor ratio is unbalanced. (a) Search for a red vertical line when there are only two red distractors. The thick lines show the only two possible selection paths through the tree. If the selection process finds itself in cluster {3, 4, 5}, it will pick one of the items in the cluster which has the greatest overlap with the target object file. The target has a 50% chance of being detected on the first selection. (b) Search for a red vertical line when there are only two vertical distractors. The thick lines show the only two possible selection paths through the tree. If the selection process finds itself in cluster {6, 7, 8} it will again return one of the nontarget items within that cluster. As in (a), the target has a 50% chance of being detected on the first selection.
Figure 10
 
Simulation of triple conjunction searches from (Wolfe et al., 1989) (a) Model predictions (line) and human data (filled circles) for target present searches, scaled according to the equation mean RT = 505 + 52 × (number of selections). Note that the time per selection (52 ms) is similar to the time per selection in Figure 8b. The model broadly replicates the features of human search: Hard triple conjunction is easier than double conjunction; and easy triple conjunction has a constant mean RT. (b) Predictions (lines) and human data (squares) for target absent searches, using the same scaling as in (a).
Figure 10
 
Simulation of triple conjunction searches from (Wolfe et al., 1989) (a) Model predictions (line) and human data (filled circles) for target present searches, scaled according to the equation mean RT = 505 + 52 × (number of selections). Note that the time per selection (52 ms) is similar to the time per selection in Figure 8b. The model broadly replicates the features of human search: Hard triple conjunction is easier than double conjunction; and easy triple conjunction has a constant mean RT. (b) Predictions (lines) and human data (squares) for target absent searches, using the same scaling as in (a).
Figure 11
 
Effects of spatial grouping on visual search. Search becomes much easier if the locations of items create homogenous clusters. These homogenous clusters have object files which are sufficient to identify the target unambiguously (the single O on the left) or reject entire clusters (the green group on the right).
Figure 11
 
Effects of spatial grouping on visual search. Search becomes much easier if the locations of items create homogenous clusters. These homogenous clusters have object files which are sufficient to identify the target unambiguously (the single O on the left) or reject entire clusters (the green group on the right).
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×