Abstract
Completely natural scene search is a paradigm that cannot be directly compared to the typical types of search task studied, where objects are distinct and definable. Here we have look at the possibility of predicting the performance of humans for completely natural scene tasks, using a direct comparison of human performance against new and existing computer models of viewing natural images. For the human task, 25 participants were asked to perform a search task on 120 natural Scenes while sat in a fixed mount eye-tracker. Scenes were 1280 x1024 pixels, viewed at a fixed distance so they covered approximately 37.6o x 30.5o of visual angle. Prior to viewing the Scene, they were shown a 90 pixel (2.7o) square sub-section of the Image for 1s, which was then blanked before the main scene was displayed. In 20% of trials, the presented target was NOT present, and this 20% was balanced across the participants. The identical task was given to a selection of reproductions of existing computer processing techniques, including Feature congestion (Rosenholtz, Li, Mansfield, & Jin, 2005), Saliency (Itti & Koch, 2001), Target Acquisition Model (Zelinsky, 2008) and a new variation on the Visual Difference Predictor (To, Lovell, Troscianko, & Tolhurst, 2008). We show that the models are very bad at generating parameters that predict performance, but that A’ of Human performance is predicted pretty well by the simple clutter in the image ( r(117) =- .21, p = .017), while a good predictive relationship is found from the new PVDP Cluster Object method we employ (r(58)=-0.37 p=.000). These results lead us to conclude that in natural search tasks, the nature of both the Scene and the Target are important, and that the global influence of local feature groups can have an influence of the task difficulty.
Meeting abstract presented at VSS 2012