Abstract
To study how visual attention is allocated in complex scenes under normal viewing conditions, we conducted a behavioral experiment on the Internet. Over 1000 participants used their web browser to view a series of natural and artificial scenes including landscapes, home interiors, city scenes and fractals, and were asked to select the five most interesting points in the image. Voluntary participation was solicited in science-oriented news groups and through popular Internet press articles. We found a high degree of variability in the reaction times (M=3.3s, SD=5.8s) but a surprisingly high degree of consistency in the selected locations. Their average nearest-neighbor distance was significantly less than that expected by chance, indicating clustering. This distance was smallest for early selections and increased monotonically for later selections. An increase in the number of discrete clusters was also observed for later selections. We hypothesized that the greater similarity of chosen locations for early selections was due to the influence of bottom-up, stimulus-dependent factors while the greater diversity of selected locations for later locations was due to the influence of top-down, cognitive factors. To test this hypothesis, we compared the most salient locations in each natural scene, as determined by a purely stimulus-driven computational model of visual selective attention (Parkhurst, Law & Niebur, Vision Research 2002), to the selected locations. The salient locations are most strongly correlated with the first selections and this correlation declines with later selections. These results suggest that stimulus properties can influence attentional allocation even on a long time scale (many seconds) as in this experiment. Finally, we find that conducting behavioral experiments over the Internet with volunteer participants is technically feasible, practical and can provide a potentially large amount of useful experimental data.