The effect of distractor homogeneity and target–distractor similarity on visual search was previously explored under two models designed for computer vision. We extend these models here to account for internal noise and to evaluate their ability to predict human performance. In four experiments, observers searched for a horizontal target among distractors of different orientation (orientation search; Experiments 1 and 2) or a gray target among distractors of different color (color search; Experiments 3 and 4). Distractor homogeneity and target–distractor similarity were systematically manipulated. We then tested our models' ability to predict the search performance of human observers. Our models' predictions were closer to human performance than those of other prominent quantitative models.

*k*out of the

*n*stimuli present in the display. If the target was one of the

*k*selected stimuli in a target-present display, a correct decision is made. Otherwise a guess would yield 50% success (Bergen & Julesz, 1983). This model is consistent with serial search models in which the items are searched in random order (e.g., Triesman & Gelade, 1980). Another example is the family of models that are based on signal-detection theory (SDT) (e.g., Eckstein, 1998; Eckstein, Thomas, Palmer, & Shimozaki, 2000; Green & Swets, 1966; Palmer, Ames, & Lindsey, 1993; Santhi & Reeves, 2004). The SDT models assume that the stimuli are observed with stochastic noise. According to this view, a false detection may occur when one of the distractors in a noisy observation is mistakenly perceived as a target (i.e., as belonging to the target distribution), and a miss may occur when the target is mistakenly perceived as a distractor. Hence, the chances of such detection errors increase with an increase in the number of search items and with an increase in target–distractor similarity.

*saliency*measure) and implemented it within the

*best-normal*model (Rosenholtz, 2001a). Given a one-dimensional feature space relevant to the search task (e.g., orientation) and the points in that space describing the various search items, the standard deviation associated with the distractor set is determined. Then, the saliency measure is the number of standard deviations between the target point and the mean of the points representing the distractors.

^{1}The saliency measure suggests that a search task is more difficult as the distance between the mean distractor value and the target value decreases (i.e., target–distractor similarity increases) and as the variance of the distractors increases (i.e., distractor heterogeneity increases). As in SDT models, it is assumed that the initial internal response of the visual system to the visual display is noisy. While the saliency measure is a qualitative abstract mathematical phrase for search task difficulty, the best-normal model is quantitative and was designed to predict accuracy in 2-interval forced-choice (2IFC) experiments. The best-normal model is a variation of SDT models. While SDT models assume the observer keeps a record of the exact distribution of the distractors, the best-normal model suggests that during visual search the observer uses a simpler approximated representation of distractor distribution. The true distribution is represented only by its mean and variance, that is, by the normal distribution that best fits the distractors' true distribution. Note that whereas both the best-normal and the classical SDT models predict that search performance should get harder as target–distractor similarity increases, only the best-normal model can account for the increase in search difficulty that comes with an increase in distractor heterogeneity.

^{2}

*cover*difficulty measure and the FLNN algorithm for visual search in the context of automated computerized systems for object recognition and detection. These models offer a novel approach to account for the effects of distractor heterogeneity and target–distractor similarity on the difficulty of visual search tasks. The goal of our current study was to evaluate the relevance of these models for human search performance and to test whether they improve the ability to predict human performance in comparison to other prominent visual search models.

*cover*is a measure that allows us to qualitatively predict the relative difficulty of different search tasks. As it was originally developed for computer vision, it was assumed that there is no difference between the displayed items and the observed input (i.e., there is no internal noise). Consider a visual search task where the stimuli (a single target and several distractors) differ by a single feature (e.g., color or orientation). In this case, the display items may be represented as points in a one-dimensional feature space (i.e., on a line).

^{3}The

*cover*is calculated as follows: First, the smallest difference between the target's feature value and a distractor's feature value is measured and denoted

*d*

_{T}. Then, the cover measure is the number of segments of length

*d*

_{T}required to cover all the points representing the distractors in the feature space. For example, let us calculate the cover for an orientation search task in which the target is a short horizontal line (0°) and the distractors are several lines, each oriented at 15°, 25°, or 35°. Here

*d*

_{T}is 15, and 2 segments suffice to cover the distractors' orientations. (i.e., the 15° and 25° points can be covered by a common segment of length 15, and another segment is required to cover the 35° point). Therefore, the cover measure is equal to 2. Note that for visual search tasks with homogeneous distractors (with noiseless input) the cover is always 1. Intuitively, we can say that the distractors are divided into groups of elements with similar features and the resulting number of groups reflects the difficulty of the search. The variability within such a group is determined by the target–distractor similarity (the length of

*d*

_{T}). Thus, the cover grows as the distractors' heterogeneity increases and as they become more similar to the target.

*σ*. In this study, we calculated the cover measure on such noisy input (for details, see 1). The effect of this internal noise on the cover measure depends on the noise level: As the noise level grows, more distractor groups are generated. Stimuli that belonged to one group under the original cover measure can now belong to separate groups, as their noisy representations may be different. Therefore, the cover measure, which is associated with the number of groups, grows as the internal noise level grows. As such, the cover measure reflects the increase in search difficulty that comes with the increase in internal noise. This relationship between search difficulty and internal noise level can account for the set-size effect: for a given internal noise level, an increase in set size results in a decrease in

*d*

_{T}and an increase in distractor variability, which leads in turn to a larger cover measure. Similarly, if we compare two homogeneous displays with similar set size but with different target–distractor feature distance, we will get a larger cover for the case in which the target–distractor distance is smaller. Hence, unlike the original calculations of the cover measure (Avraham & Lindenbaum, 2005, 2006), in which the cover depended only on distractor heterogeneity and the distance between the target and the closest distractor, the cover measure calculated on a noisy input depends also on the level of internal noise. As such, it may be different for observers with different levels of internal noise.

*k*out of the

*n*items. However, while in the temporal-serial model the

*k*items are selected randomly, they depend in the FLNN model on the feature values of the stimuli. In particular, the FLNN selects the

*k*items that are most dissimilar in terms of their feature values, and the typical outcome is that these

*k*items include representatives of the various groups of items that are formed due to feature similarity. Thus, the search according to the FLNN model is more akin to a search through similarity-based groups than it is to a search through single elements. Moreover, as with the cover measure, the groups relevant for the FLNN search are not necessarily homogeneous, and the degree of within-group heterogeneity depends on the distance, in feature space, between the target and the distractors. Finally, here too we assume that the observations are noisy. Hence, to characterize accuracy in 2IFC experiments, the FLNN model requires two parameters for each individual observer:

*σ*(the level of internal noise) and

*k*(for details, see 1).

*unidirectional orientation,*the second

*bidirectional*

*orientation,*the third

*unidirectional color,*and the fourth

*bidirectional color*.

*unidirectional*experiments ( Experiments 1 and 3) was to obtain corroborating evidence for the hypothesis that tasks with heterogeneous distractors may be harder than homogeneous ones, even when the distractors in the heterogeneous case are less similar to the target than in the homogeneous case (i.e., are less “confusable” with the target). To that end, all the distractor feature values in the unidirectional experiments lie on one side of the target's feature value. That is, the feature values of all the distractors are either larger than the target's feature value ( Experiment 1, Figure 2a) or smaller ( Experiment 3, Figure 2c). The smallest target–distractor distance (

*d*

_{T}) in conditions 1 and 2 of the unidirectional experiments was the same. However, whereas in condition 1 all the distractors were at

*d*

_{T}distance from the target (i.e., a homogeneous display), in condition 2 half of the distractors were at

*d*

_{T}distance from the target and the other half were at a greater distance from the target (i.e., a heterogeneous display). If a search through a heterogeneous display can be harder than a search through a homogeneous display, even though some of the heterogeneous distractors are less similar to the target than the distractors in the homogeneous case, condition 2 should be harder than condition 1.

*d*

_{T}(see Figures 2a and 2c). The specific values were set so that the target–distractor distance in condition 2 was smaller than the distance between the distractors and larger in condition 4. If the relative rather than the absolute values of these distances affects the efficiency of the search, there should be a considerable performance difference between conditions 1 and 2 but a smaller difference between conditions 3 and 4.

*bidirectional*experiments ( Experiments 2 and 4), the distractors' feature values were on both sides of the target's feature value (and were always arranged symmetrically): In each display, half of the distractors' feature values were larger than the target's feature value and half were smaller ( Figures 2b and 2d). They were designed this way because the cover measure and the FLNN model can predict performance differences between different symmetric conditions, while the saliency measure cannot predict performance differences for cases in which the distractors' feature values are symmetric around the target's feature value. In such cases, the saliency measure is 0 for all conditions. The bidirectional experiments tested whether human observers also experience such differences in difficulty and whether these differences follow our models' predictions. In particular, our models predict that the search should get harder as the target–distractor distance decreases in comparison to the distance between the distractors themselves (condition 1 vs. conditions 2 and 3 vs. conditions 4 and 5) because more segments of length

*d*

_{T}are required to cover all the points representing the distractors in the feature space. Additionally, the FLNN model suggests that the search is harder when the feature value that is initially examined is the one most similar to the target. Since the FLNN model chooses this initial feature value randomly, it predicts that the search should be harder when there are more distractors that are similar to the target than distractors that differ from the target (condition 2 vs. 3 and condition 4 vs. 5; Figures 2b and 2d).

*χ*

^{2}/

*df*) and the chi-square test (

*χ*

^{2}test) (see 2 for details). We have chosen these tests because they allow us to compare models with a different number of parameters (Taylor, 1982). The implementations of the cover measure and the FLNN model are described in detail in 1. The implementation of the saliency measure followed its description in Rosenholtz (1999). The implementations of the SDT model, the best-normal model, and the RCref model followed the description of these models in Rosenholtz (2001a), and the implementation of the temporal-serial model followed the description of the model in Eckstein (1998). For the modeling of performance in the orientation search experiments (Experiments 1 and 2), we considered the orientation distribution to be wrapped (Rosenholtz, 2001a). A line segment of orientation

*α*° can be considered both as of orientation

*α*° and of orientation (180–

*α*)°. For the modeling of the color search experiments (Experiments 3 and 4), we considered the distributions to be non-wrapped.

- the saliency, cover, FLNN, and best-normal models but not the temporal-serial or the SDT-based models can account for the effects of heterogeneous displays like those used in the unidirectional experiments;
- the cover, FLNN, and SDT-based models but not the saliency, best-normal, or temporal-serial models can predict performance differences between the different symmetric conditions of the bidirectional experiments;
- both the FLNN and the temporal-serial models assume that capacity is limited: When display duration is limited, only
*k*items out of the total number of items is considered. However, because the*k*items in the temporal-serial model are selected randomly, performance differences should only emerge when the number of items is different, whereas in the FLNN model the*k*items are chosen based on their feature values. Specifically, the FLNN selects the*k*items that are most dissimilar. - Lastly, the final decision of the various models is based on different information. The SDT model uses the exact feature distribution of all the items in the display; the RCref model uses a distribution of the relative feature values (i.e., the differences between the various items) rather than the absolute feature values and the differences between the display items and a reference target; the best-normal model uses the normal distribution that best fits the distractors' true distribution; and the FLNN model uses the distribution of the
*k*chosen items that typically represent the different similarity-based groups present in the display.

*z*test,

*p*< 0.05), demonstrating that a search through a heterogeneous display can be harder than a search through a homogeneous display even when half of the distractors in the heterogeneous case are less similar to the target than those in the homogeneous case (see Figure 3a). Moreover, in contrast to the observed difference between conditions 1 and 2, performance in conditions 3 and 4 was not significantly different for all 5 participants. This finding is consistent with the hypothesis that search efficiency depends on the relative rather than absolute values of target–distractor and distractor–distractor feature space distances. In addition, for 2 participants (A.P. and V.S.), condition 1 was significantly harder than conditions 3 and 4 (

*z*test,

*p*< 0.05).

Participant | Predictions of difficulty order | Correlation coefficients ( r) of accuracy vs.
prediction and model parameters ( σ) | ||||||
---|---|---|---|---|---|---|---|---|

SDT | Cover | Saliency | Cover | Saliency | ||||

r | σ | r | σ | |||||

Experiment 1 | A.P. | − | + | + | 0.999* | 2.6 | 0.812 | 10.1 |

Y.B. | − | + | + | 0.998* | 1.6 | 0.538 | 8.5 | |

D.A. | − | + | + | 0.997* | 1.8 | 0.572 | 9.8 | |

V.S. | − | + | + | 0.999* | 1.8 | 0.570 | 10.6 | |

A.P.Z. | − | + | ∼ | 1* | 0 | 0.510 | 9.7 | |

Experiment 2 | A.D. | ∼ | + | + | 0.926* | 17.8 | – | – |

A.A. | ∼ | − | − | 0.903* | 20.0 | – | – | |

M.D. | + | − | − | 0.962* | 19.9 | – | – | |

L.F. | ∼ | − | − | 0.873 | 16.3 | – | – | |

Experiment 3 | D.A. | − | + | ∼ | 0.862 | 11.0 | 0.900 | 8.0 |

S.M. | − | + | ∼ | 0.883 | 11.0 | 0.945 | 7.4 | |

E.D. | ∼ | + | + | 0.993* | 4.3 | 0.996* | 8.7 | |

G.S. | − | + | ∼ | 0.997* | 2.8 | 0.880 | 8.6 | |

Experiment 4 | R.A. | − | ∼ | − | 0.967* | 15.1 | – | – |

O.R. | ∼ | + | + | 0.956* | 13.2 | – | – | |

R.I. | ∼ | + | + | 0.979* | 16.7 | – | – | |

A.O. | − | + | − | 0.951* | 19.9 | – | – | |

R-1 | R.E.R. | − | + | + | 0.910 | 1.5 | 0.894 | 8.0 |

B.L.B. | − | + | ∼ | 0.918 | 8.6 | 0.903 | 10.2 | |

R-2 | J.A.K. | ∼ | + | ∼ | 0.994* | 14.6 | 0.998* | 13.0 |

J.O.E. | ∼ | + | ∼ | 0.986* | 13.0 | 0.982* | 11.9 |

*r*) between the predictions of each model and participant's accuracy (Papoulis & Pillai, 2002). We report the correlation coefficient

*r,*its significance,

^{4}and the resulting noise parameter (

*σ*) for each participant in the right side of Table 1. As can be seen in the table, the correlation coefficients of the cover measure were significant for all five participants. This suggests that the cover measure can quantitatively predict the performance of all five participants. In contrast, none of the correlation coefficients of the saliency measure were significant. Thus, although the saliency measure can predict the correct order of difficulty for most participants, its correlation coefficients are lower than those of the cover measure, and none reached statistical significance. This indicates that there is no good linear transformation from the saliency measure predictions to the accuracy of the participants, while there are some good linear transformations from the cover measure predictions to the accuracy of the participants. It is possible, of course, that there is a good non-linear transformation from the saliency measure to the observed data, and it should be interesting to examine whether indeed there is a different transformation that will suggest better fitting for the saliency measure prediction. We leave this to future research.

*χ*

^{2}/

*df*) and the chi-square test (

*χ*

^{2}test) (for details, see 2) to compare their predictive abilities (Taylor, 1982). The reduced chi-square measure allows us to compare models that use a different number of parameters by assigning a better fitting grade to a model with fewer parameters that predicts the same results. The lower the value of the reduced chi-square is, the more accurate the prediction. The chi-square test determines whether the probability of obtaining a

*χ*

^{2}value larger than the one measured is higher than 0.05, given that the data was supported by the model and taking into account the degrees of freedom (

*df*). If it is not, the model is rejected.

^{5}In Table 2, we report, for each combination of model and participant, the values of the model's parameters that gave the best fit (i.e., resulted in the lowest

*χ*

^{2}value), the reduced chi-square measure, and whether the model was rejected on the basis of the chi-square test. As can be seen in Table 2, the reduced chi-square score of the FLNN model is the lowest for 4 out of the 5 participants, and it is only rejected by 1 out of the 5 chi-square tests. The best-normal model is rejected by 3 out of 5 chi-square tests, the RCref by 4 out of 5 tests, and the SDT and temporal-serial models are rejected by all 5 tests. Thus, the predictions of the FLNN model are clearly the closest to human search performance in this experiment.

Participant | FLNN | SDT | Best-normal | Temporal-serial | RCref | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

χ ^{2}/ df | σ | k | χ ^{2}/ df | σ | χ ^{2}/ df | σ | χ ^{2}/ df | k | χ ^{2}/ df | σ | ||

Experiment 1 | A.P. | 7.477 | 3.2 | 1.8 | 27.971 | 6.7 | 20.142 | 5.8 | 16.302 | 26.4 | 1.535* | 3.6 |

Y.B. | 2.395* | 2.7 | 2.6 | 19.491 | 3.6 | 2.940 | 1.9 | 18.785 | 35.4 | 11.270 | 0.4 | |

D.A. | 1.954* | 2.8 | 2.7 | 12.610 | 3.8 | 3.313 | 1.4 | 12.044 | 35.1 | 5.734 | 0.4 | |

V.S. | 1.376* | 2.7 | 2.4 | 23.859 | 4.1 | 1.510* | 3.6 | 27.891 | 35.4 | 28.350 | 0.5 | |

A.P.Z. | 1.800* | 2.4 | 2.6 | 16.786 | 3.4 | 1.954* | 1.6 | 14.962 | 35.3 | 14.888 | 0.5 | |

Experiment 2 | A.D. | 1.553* | 7.2 | 6.5 | 1.135* | 8.0 | 0.150* | 2.7 | 0.115* | 8.5 | 0.280* | 2.4 |

A.A. | 1.272* | 5.9 | 10.0 | 1.007* | 6.0 | 0.987* | 0.6 | 1.5888* | 14.4 | 5.781 | 1.3 | |

M.D. | 3.491 | 8.2 | 6.6 | 2.510 | 9.6 | 3.606 | 6.5 | 2.820 | 5.8 | 2.480 | 3.8 | |

L.F. | 1.213* | 6.1 | 14.0 | 1.368* | 6.2 | 3.497 | 0.8 | 4.015 | 13.9 | 1.860* | 1.4 | |

Experiment 3 | D.A. | 32.912 | 3.3 | 7.0 | 22.128 | 4.6 | 37.029 | 4.1 | 6.980 | 13.6 | 4.750 | 2.9 |

S.M. | 36.668 | 2.9 | 1.8 | 26.838 | 5.5 | 31.85 | 9.0 | 14.14 | 12.7 | 7.786 | 3.6 | |

E.D. | 0.649* | 3.5 | 1.5 | 2.853 | 15.2 | 2.274* | 14.8 | 14.092 | 6.8 | 0.074* | 10.4 | |

G.S. | 0.633* | 3.1 | 1.6 | 7.236 | 13.9 | 6.477 | 13.6 | 12.294 | 8.0 | 2.646 | 7.4 | |

Experiment 4 | R.A. | 3.665 | 5.6 | 5.0 | 3.028 | 5.5 | 1.805* | 8.5 | 2.189* | 6.1 | 2.510 | 2.2 |

O.R. | 1.067* | 6.0 | 5.0 | 0.252* | 8.9 | 0.456* | 16.0 | 0.263* | 4.9 | 1.494* | 3.7 | |

R.I. | 1.069* | 5.6 | 5.4 | 0.782* | 7.0 | 0.155* | 6.3 | 0.075* | 6.7 | 2.919 | 2.0 | |

A.O. | 1.759* | 5.5 | 4.6 | 3.131 | 7.8 | 1.372* | 9.8 | 2.327* | 5.8 | 2.848 | 2.3 | |

R-1 | R.E.R. | 1.565* | 6.1 | 2.7 | 15.964 | 8.3 | 2.326* | 7.5 | 9.364 | 32.4 | 2.687 | 2.7 |

B.L.B. | 5.725 | 0.8 | 1.6 | 15.025 | 14.9 | 9.099 | 13.9 | 12.666 | 18.7 | 8.483 | 9.7 | |

R-2 | J.A.K. | 0.467* | 9.8 | 2.4 | 2.276* | 14.6 | 1.182* | 14.3 | 0.471* | 5.9 | 2.390* | 3.6 |

J.O.E. | 0.326* | 9.1 | 2.5 | 2.741 | 12.9 | 1.686* | 12.6 | 0.326* | 6.4 | 1.875* | 3.0 |

*z*test,

*p*< 0.05) were different for the different participants: For A.A., condition 3 was significantly harder than conditions 4 and 5; for M.D., condition 1 was significantly harder than the other conditions; and for L.F., condition 2 was significantly easier than conditions 1, 3, and 5 while condition 4 was significantly easier than conditions 1 and 5.

*r*and tested its significance for the cover and the saliency measures ( Table 1). The predictive success of the SDT model depends on the similarity of the distractors to the target; the model could predict the order of difficulty for one participant. The cover measure cannot predict that the display in which the distance between the distractors is the smallest (condition 1) would be the hardest to search through, and it could only predict the order of difficulty for one participant for which condition 1 was not the hardest. Still, it succeeded in providing significant correlation coefficients for three out of the four participants. The saliency measure predicts a 0 saliency for all conditions because the target value and the mean of the distractors' values are equal. It is therefore not possible to calculate the

*r*correlation coefficient for this model as it is not defined for constant vectors. In other words, no linear transformation can transform the constant values of the saliency measure to the participants' measured accuracies. Like the cover measure, it could only predict the difficulty order of one participant.

*u*′,

*v*′, and cd/m

^{2}values with a Tektronix J18 LumaColor™ II Photometer. We used the

*L**

*u*′

*v*′ color space because it was designed so that the distances in the color space are linear with differences in color perception (C.I.E. 1978; for equations see, e.g., Travis, 1991). Figure 6a depicts the exact distances in

*u*′

*v*′ space, when

*L** is constant, for this experiment. We also tried to keep

*v*′ more-or-less constant and changed only

*u*′. Hence, the distance in feature space between the search items (i.e., their feature value) was computed from the

*u*′-value differences. Additionally, we chose a black background because its perceptual distance from all stimuli is approximately equal (Figure 6c). This exempted us from taking it into account when considering the different models. The color of the target disk was always gray (0 feature value; Table 3), and it was present in the first or second interval equally often. The color of the distractor disks presented in each of the four conditions of this experiment was as follows (see also Figure 2c): In condition 1, all the distractors had the same greenish color (−10 feature value). In condition 2, half of the distractors had one greenish color (−10 feature value) and the other half had another greenish color (−30 feature value). In condition 3, all the distractors had the same greenish color (−30 feature value). Finally, in condition 4, half of the distractors had one greenish color (−30 feature value) and the other half had another greenish color (−50 feature value).

Description | u′ | v′ | cd/m ^{2} = Y | R | G | B |
---|---|---|---|---|---|---|

Black | 0 | 0 | 0 | 0 | 0 | 0 |

−50 | 0.138 | 0.422 | 22.1 | 0 | 185 | 158 |

−35 | 0.151 | 0.425 | 23.0 | 49 | 180 | 159 |

−30 | 0.1595 | 0.427 | 23.6 | 73 | 178 | 159 |

−25 | 0.162 | 0.426 | 23.25 | 81 | 177 | 159 |

−15 | 0.173 | 0.4265 | 24.1 | 115 | 172 | 160 |

−10 | 0.179 | 0.427 | 24.35 | 130 | 168 | 160 |

0 (target) | 0.189 | 0.427 | 24.6 | 160 | 160 | 160 |

15 | 0.2055 | 0.424 | 24.25 | 204 | 143 | 161 |

25 | 0.211 | 0.421 | 23.85 | 218 | 135 | 162 |

35 | 0.223 | 0.4165 | 23.1 | 240 | 121 | 163 |

*z*test,

*p*< 0.05) and also significantly easier than condition 1. For 2 participants (E. D. and G.S.), condition 4 was also significantly easier than conditions 1 and 2. Unlike Experiment 1, there were no consistent differences between conditions 1 and 2. For 2 participants (D.A. and S.M.) condition 1 was significantly harder than condition 2, and for one participant (G.S.) condition 2 was significantly harder than condition 1. Finally, for 2 participants (D.A. and S.M.), condition 4 was significantly harder than condition 3. Hence, the pattern of results found in this experiment is different than in Experiment 1, where orientation is the search-relevant feature. This may suggest that searches that are based on different features are limited by different factors. Alternatively, the true distances between the various feature values that make up the search display might be different than those assumed here because the color space we used might not be a good enough match for the perceptual color space (e.g., Fairchild, 1998). We proceed to examine how well the models can predict these results.

*z*test,

*p*< 0.05) were found between some conditions for 2 out of the 4 participants (see Figure 3d). Specifically, for R.A., conditions 2 and 5 were significantly harder than conditions 1, 3, and 4. For A.O., conditions 4 and 5 were significantly harder than conditions 1, 2, and 3. Thus, as in Experiment 2, differences in accuracy were found even with displays that include distractors whose feature values lie symmetrically on both sides of the target's feature value. This finding does not agree with Bauer, Jolicoeur, and Cowan (1996), who suggest that all color search tasks should be equally hard whenever the target is not linearly separable from the distractors.

*r*correlation coefficient for this model when the display is bidirectional. The cover measure was able to predict the order of difficulty for three out of the four participants and passed the correlation coefficient test for all participants. The SDT model cannot predict the order of difficulty for any of the four participants, as it considers condition 1 to be the hardest, followed by 3, 5, 2, and finally 4. The predictive abilities of the FLNN, SDT, best-normal, RCref, and temporal-serial models can be evaluated from Table 2 and Figure 8. The best-normal and temporal-serial models passed all four chi-square tests, the FLNN model passed three out of four chi-square tests, the SDT model passed two out of four chi-square tests, and the RCref model passed only one chi-square test.

^{6}The FLNN passed the chi-square test for one participant in Experiment R-1 and for both participants in Experiment R-2. Additionally, it provided the lowest reduced chi-square measure for both experiments. The best-normal model also passed the chi-square test for one participant in Experiment R-1 and for both participants in Experiment R-2. The RCref and the temporal-serial models passed the chi-square test for both participants in Experiments R-2 but for none in Experiment R-1. Finally, the SDT model passed only one chi-square test for one participant in Experiment R-2.

No. of participants | Order prediction | Correlation coefficient significance | Lowest χ ^{2}/ df | Passed χ ^{2} test | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Cov | Sal | Cov | Sal | FLNN | SDT | B-N | T-S | RCref | FLNN | SDT | B-N | T-S | RCref | ||

Experiment 1 | 5 | 5 | 4 | 5 | 0 | 4 | 0 | 0 | 0 | 1 | 4 | 0 | 2 | 0 | 1 |

Experiment 2 | 4 | 1 | 1 | 3 | 0 | 1 | 0 | 1 | 1 | 1 | 3 | 3 | 2 | 2 | 2 |

Experiment 3 | 4 | 4 | 1 | 2 | 1 | 1 | 0 | 0 | 0 | 3 | 2 | 0 | 1 | 0 | 1 |

Experiment 4 | 4 | 3 | 2 | 4 | 0 | 0 | 1 | 2 | 1 | 0 | 3 | 2 | 4 | 4 | 1 |

R-1 | 2 | 2 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |

R-2 | 2 | 2 | 0 | 2 | 2 | 2 | 0 | 0 | 1 | 0 | 2 | 1 | 2 | 2 | 2 |

Total | 21 | 17 | 9 | 16 | 3 | 10 | 1 | 3 | 3 | 5 | 15 | 6 | 12 | 8 | 7 |

*L**

*u*′

*v*′ color space are proportional to the differences recognized by human observers. Although this assumption is commonly employed, there has been some recent criticism about the accuracy of this color system (e.g., Fairchild, 1998). The deviation from linear relations between the feature space employed by the models and the actual perceived feature space might be larger for color than orientation. If so, this can explain the fact that our models' predictions were better in the latter case. Hence, because the models' predictions are based on distances in feature space between the various search items, the predictions can only be as accurate as the feature space assumed by the models. The further our knowledge regarding the perceived feature space of various features advances, the more the accurate predictions provided by the models might be.

*d*

_{1}is the feature-wise distance between a pair of items in one dimension and

*d*

_{2}is the distance in the other dimension, then the Euclidean distance is

*d*

_{T}decreases, and the corresponding cover increases. It can be shown that this effect is more pronounced with two dimensions.

^{7}We made a preliminary attempt to quantitatively account for the set-size effect by testing the ability of the FLNN model to predict the results reported in Eckstein (1998). Eckstein found a substantial set-size effect with a search for a target defined by a conjunction of orientation and contrast. There was no set-size effect with a search for a target defined by one feature (orientation or contrast). He further demonstrated that an SDT-based model, but not the temporal-serial model, can account for these findings. Here, as a pilot test, the FLNN two-dimensional implementation assumes the Euclidean distance as the dissimilarity measure and uses three parameters:

*k,*and one noise level for each of the two dimensions,

*σ*1 and

*σ*2. For simplicity, the weights for the two dimensions were assumed to be equal. Figure 10 presents the FLNN predictions for the three participants of the original study. As can be seen, FLNN can predict that, for conjunction search, overall performance will be worse than for feature search; it can also predict the steeper slope as a function of set size in conjunction search. In future work, the various assumptions we employed here should be tested and a fourth parameter may be added to express the relative difference in the weights of the two dimensions. We believe that this would improve the predictions reported here. Also note that for this preliminary test the experimental results were estimated from the figures in Eckstein's paper and therefore may not be accurate.

*d*

_{T}. Then, the cover measure is the number of segments of length

*d*

_{T}required to cover all the points representing the distractors in the feature–space. This original measure does not consider the observer's internal noise. Here we suggest a method for estimating the average cover measure when normally distributed internal noise is added to the observations. Given the feature values associated with the stimuli of a specific experimental condition and the internal noise variance

*σ*

^{2}, the cover measure is estimated as follows:

- An observed target-present display is randomly generated by picking the feature value of each element from the normal distribution it belongs to (with the mean being its true displayed value and
*σ*being a parameter denoting the level of noise). - The cover value is calculated for the specific generated case (using the original cover definition).
- This is repeated
*N*times, resulting in*N*Cover values. - The suggested prediction is the average over those calculated values.

*N*= 1000.

*x*

_{1},…,

*x*

_{n}be the feature values corresponding to the displayed elements and let

*x*

_{s1},…,

*x*

_{sm}(

*m*<

*n*) be the feature values corresponding to the already selected elements. For each element

*x*

_{i}that has not yet been selected, the algorithm finds the closest distance to a selected item's feature value:

*d*

_{min}(

*x*

_{i}) = min

_{j}

_{= 1,}…

_{,m}∣

*x*

_{i}−

*x*

_{sj}∣, and the next selected element is the one for which

*d*

_{min}is maximum. This procedure is repeated until the target is found.

- An observed target-present display is randomly generated by picking the feature value of each element from the normal distribution to which it belongs (with the mean being its true displayed value and
*σ*being a parameter denoting the level of noise). - In a similar way, an observed target-absent display is generated.
- It is assumed that in the limited presentation time of the display, only
*k*elements can be processed. Therefore, the FLNN algorithm is simulated for*k*steps on each of the two generated “displays,” and we get*k*selected elements for each display. - The algorithm identifies the display that contains the stimulus that is most similar (in terms of feature-values) to the true target value (without noise), out of the 2
*k*selected stimuli, as the target-present display. - If the algorithm points to the target-present generated display, this is counted as a success; otherwise, it is a failure.
- All the above steps are repeated many times (10,000 in our case) and the ratio of success is the model's prediction.

*k*(the number of display elements examined by the observer in the limited presentation time) and

*σ*(the internal noise level of the observer). We allow non-integer values for

*k*. If

*N*<

*k*<

*N*+ 1, where

*N*is an integer, then

*N*elements are considered in some trials and

*N*+ 1 elements are considered in the other trials. For example, if

*k*equals 5.8, then in 20% of the trials 5 items are considered and in 80% of the trials 6 items are considered.

*χ*

^{2}/

*df*) measure because it enables the comparison between models that use a different number of parameters (Taylor, 1982). A model with more parameters gets a lower fitting grade than a model with fewer parameters that gives the same prediction.

*c*is the number of conditions,

*p*is the number of model parameters, Acc

_{i}is the accuracy of the participant on condition

*i,*Prediction

_{i}is the prediction of the model for condition

*i,*and

*SE*

_{i}

*is*the standard error of the participant's accuracy for the

*i*th condition. The lower the

*χ*

^{2}/

*df*value is, the better the model can predict the results. The chi-square test reports whether the probability of obtaining a

*χ*

^{2}larger than the one measured, given that the data followed the model and taking into account the degrees of freedom

*df,*is higher than 0.05. If it is not, the model is rejected.

^{1}Originally, the saliency measure was suggested also for multidimensional feature spaces, in which case it refers to covariance rather than to standard deviation. For further details see Rosenholtz (1999).

^{2}Our implementation of the RCref model followed Rosenholtz (2001a). Thus, an item was compared to the reference target on 30% of the times.

^{5}The chi-square test penalizes FLNN for using two parameters (as opposed to the other models, which use only one) by demanding a closest fit between the prediction and the data. This motivated us to check whether the two parameters (

*σ*and

*k*) covary. The optimization surfaces suggested that the parameters are not statistically dependent and that good fits require a combination of both parameters.

^{6}There are minor differences between the predictions of the SDT, best-normal and RCref reported here, and those reported in Rosenholtz (2001a) because we have chosen the fits that minimize chi-square, whereas Rosenholtz minimized the sum of square differences.

^{7}For one-dimensional feature space, we defined the cover as the number of

*d*

_{T}-long segments required to cover all the points in the feature space representing the distractors. For two dimensions, the cover is defined as the number of disks with diameter

*d*

_{T}required to cover the distractor points (see Avraham & Lindenabum, 2006). If

*d*

_{T}decreases by 50% due to the increase in set size, then, for a one-dimensional case, a segment of length 2 is divided into two segments of length 1. Yet, for a two-dimensional case, even 4 disks of diameter 1 cannot cover a disk of diameter 2.