Free
Research Article  |   February 2010
Real and predicted influence of image manipulations on eye movements during scene recognition
Author Affiliations
Journal of Vision February 2010, Vol.10, 8. doi:https://doi.org/10.1167/10.2.8
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Glen Harding, Marina Bloj; Real and predicted influence of image manipulations on eye movements during scene recognition. Journal of Vision 2010;10(2):8. https://doi.org/10.1167/10.2.8.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

In this paper, we investigate how controlled changes to image properties and orientation affect eye movements for repeated viewings of images of natural scenes. We make changes to images by manipulating low-level image content (such as luminance or chromaticity) and/or inverting the image. We measure the effects of these manipulations on human scanpaths (the spatial and chronological path of fixations), additionally comparing these effects to those predicted by a widely used saliency model (L. Itti & C. Koch, 2000). Firstly we find that repeated viewing of a natural image does not significantly modify the previously known repeatability (S. A. Brandt & L. W. Stark, 1997; D. Noton & L. Stark, 1971) of scanpaths. Secondly we find that manipulating image features does not necessarily change the repeatability of scanpaths, but the removal of luminance information has a measurable effect. We also find that image inversion appears to affect scene perception and recognition and may alter fixation selection (although we only find an effect on scanpaths with the additional removal of luminance information). Additionally we confirm that visual saliency as defined by L. Itti and C. Koch's (2000) model is a poor predictor of real observer scanpaths and does not predict the small effects of our image manipulations on scanpaths.

Introduction
It has been known for some time that the steady gaze positions, or fixations, associated with overt visual attention are not randomly distributed about a scene under inspection, but rather are clustered about certain features, especially if some task such a visual search is underway (Buswell, 1935; Castelhano, Mack, & Henderson, 2009; Yarbus, 1967). It has also been noted that during viewing of patterned images, for an individual observer, these distributions and the chronological order of fixations have some repeatability over multiple viewings of the same image (Noton & Stark, 1971). This repeatability has since been shown to also occur during the free viewing (no specific task, other than to observe the image) of natural images (Mannan, Ruddock, & Wooding, 1995). Current theory suggests possible mechanisms that might underlie the direction of fixation that can be broadly split in to two groups: bottom-up and top-down. 
Top-down processing, or cognitive control, is driven by the prior knowledge, expectation, or strategies of the observer, such as the recognition of objects within the image or a specific task, such as a search. Scanpath theory (Noton & Stark, 1971) attempts to explain the repeatability of scanpaths (the spatial and chronological path of subsequent fixations) using such a top-down processing model. Stark and Ellis (1981) further this, arguing that the visual patterns that form an image are represented as a feature and attention network in memory that is subsequently compared to the stimulus when later recognizing the same image, resulting in the repeated scanpath. There is some support for this from other studies such as Brandt and Stark (1997) and Laeng and Teodorescu (2002) that show similarity between scanpaths made when viewing stimuli and when later imagining the same stimuli. However, little other evidence of the theoretical basis of scanpath theory exists, and no study has been able to provide support for the strong form of the theory, where identical eye movements would occur for repeated viewing of the same image. Significant variations in scanpaths are evident both between people and viewings and scanpath theory cannot fully account for observed eye movements. 
Alternatively, fixation selection may be controlled by a bottom-up mechanism. Such a process may account for human eye movements via a system that directly responds to low-level image features, such as luminance, contrast, and color. 
In the case of free viewing, scanpaths during the first few seconds are generally not determined consciously by the observer (Mannan et al., 1995). This of course does not preclude the use of unconscious top-down mechanisms but rather limits the role of any conscious strategies during this initial time period. Models of visual saliency attempt to predict these early eye movements by means of a saliency map that describes how likely a particular point in an image is to attract visual attention via a bottom-up processing mechanism, based on the low-level features of the image. The concept of visual saliency is fundamental to theories of visual attention deployment and human eye movements and has been defined in several different ways as the basis for a number of saliency models. Some models use a Bayesian probability-based approach (e.g., Butko, Zhang, Cottrell, & Movellan, 2008; Torralba, Oliva, Castelhano, & Henderson, 2006), some aim to learn a saliency model from eye movement data (Kienzle, Wichmann, Schlkopf, & Franz, 2007), while others try to generate a model related to the features of the human visual system, such as Itti and Koch (2000, 2001). Itti and Koch's model takes account of visual features such as brightness, orientation, contrast, and color and combines them to create a saliency map that predicts the conspicuity of features in the scene. This is combined with an inhibition-of-return mechanism to produce a scanpath prediction for the scene. Itti and Koch's saliency model is used extensively by the computer graphics community for predicting fixation selection. Its use includes applications such as selective rendering, where areas of a generated image are rendered at a higher quality if they are predicted to be more likely to be fixated (see, for example, Debattista, Sundstedt, Pereira, & Chalmers, 2005), and graphic design (Cleveland, 2009). The reliance in computer graphics on saliency models for predicting fixation appears mainly to be due to the availability and ease of implementation of existing algorithms and an assumption that such models do a sufficiently good job of predicting fixation selection. 
There is currently some debate as to what extent eye movement control is influenced by both low- and high-level processing and whether or not one mechanism may dominate under all or some conditions. For example, previous work has argued both for (Parkhurst, Law, & Niebur, 2002) and against (Tatler, 2007; Tatler, Baddeley, & Gilchrist, 2005) the increased influence of bottom-up processing during the first few seconds of image viewing. This debate is complicated by the fact that both theories can often be used to make the same predictions, since fixated areas of an image are often both informative from both a low-level and semantic point of view (Henderson, Brockmole, Castelhano, & Mack, 2007; Tatler, 2007). Indeed, studies have shown that fixated regions of images correlate with both low-level image features (Henderson et al., 2007; Parkhurst et al., 2002; Reinagel & Zador, 1999; Tatler et al., 2005) and subjectively or semantically informative regions (Castelhano et al., 2009; Henderson et al., 2007). Recent work such as that by Foulsham and Underwood (2008) and Underwood, Foulsham, van Loon, Humphreys, and Bloyce (2006) has attempted to address this problem by comparing saliency model predictions, which are driven by the low-level image features, and actual recorded fixations, with the aim of discovering how well fixation selection can be predicted by a pure bottom-up model. In the case of Underwood et al. (2006), observers shown natural images were found to fixate more salient objects, but this effect could be overridden when observers were asked to search for another low saliency object. Foulsham and Underwood (2008) also found that there were correlations between areas selected by a saliency model and real fixation locations when viewing natural images, but uniquely also measured observer scanpaths rather than just fixation locations, comparing these scanpaths with those predicted by Itti and Koch's (2000) saliency model. Foulsham and Underwood found that saliency was a poor predictor of the order of fixations, concluding that scanpaths could not be completely explained by saliency. 
Many authors have suggested that bottom-up and top-down control of eye movements may occur and that the relative contributions of each type of mechanism may vary. For example, Althoff and Cohen (1999) argue that a memory effect causes changes in processing between first and subsequent scene viewings that are likely due to a strategy to maximize information extraction. Such a change in top-down processing would suggest that any bottom-up effects of saliency would be more prevalent on the first viewing of a scene, giving a higher correlation of fixation with saliency. As noted above, Parkhurst et al. (2002) showed that bottom-up saliency at fixation was greater than chance (although correlations were small) and saliency associated with a fixation position decreased with increasing time, lending some weight to Althoff and Cohen's (1999) suggestions. However, this decrease of saliency with time has since been shown to be an artifact of spatial biases in both saliency and, initially but reducing over time, fixation (Tatler, 2007; Tatler et al., 2005). Although bottom-up saliency has been shown to be correlated with fixation during free-viewing or memory task, there is generally no correlation during search tasks (Chen & Zelinsky, 2006; Foulsham & Underwood, 2007; Nyström & Holmqvist, 2008; Stirk & Underwood, 2007; Underwood & Foulsham, 2006; Underwood, Templeman, Lamming, & Foulsham, 2008). However, some studies such as Nyström and Holmqvist (2008) have also been unable to find a correlation of low-level image features with fixation even when no task other than to “study the images” was given. 
In this study, we measure the effects on scanpaths of controlled changes to both low- and high-level factors. In our first experiment, we attempt to isolate the effect on eye movements, during the initial viewing of natural scenes, of adjustments to individual low-level image features (such as the removal of chromatic or luminance information). We compare scanpaths elicited by images that have been altered, by adjusting individual low-level image properties, with scanpaths elicited by both unaltered and entirely different images. We also compare the affect on scanpaths of these low-level changes, as predicted by Itti and Koch's (2000) saliency model, to the changes evident in real observer's scanpaths. 
In our second experiment, we study the effect of scene inversion on eye movements, by means of which we intend to affect top-down influences on fixation selection. Many natural scenes contain either horizons or objects that are normally seen with a particular orientation and we can hypothesize that inversion of the image will disrupt scene interpretation. Experiments looking at change detection have lent weight to this hypothesis (Shore & Klein, 2000). In a more recent change detection study (Kelly, Chun, & Chua, 2003), the authors looked at the influence of scene inversion on saliency. They noted that more salient changes were less likely to be noticed when scenes were inverted, concluding that scene interpretation both influences deployment of attention and is also disrupted by inversion. Again, scanpaths of real observers are compared with those predicted by Itti and Koch's (2000) saliency model. 
The two experiments outlined above are designed to selectively adjust factors that may effect either low- or high-level information to a greater extent (or in the case of image inversion, only high-level information). In this way, we aim to provide a useful contribution to both knowledge of which (if any) low-level features are important to fixation selection and scanpath formation, and to the debate surrounding the relative contributions of saliency and cognitive control to these aspects of eye movement. 
Methods
Three experiments were performed. A control experiment, Experiment 1, and Experiment 2. Experiment 1 was designed to investigate the effect on observer scanpaths of selectively altering image saliency by modifying individual image attributes. Specific image attributes that are thought to contribute to saliency, such as luminance and color, were altered. Experiment 2 was designed to investigate the effect on observer scanpaths of altering image context, by image inversion, and the effects of combined disruption to context and saliency. The control experiment was intended to test for any confounding effects of repeated viewings of the same scene. This is explained in greater detail below. 
Participants
All observers were naive to the purpose of the experiment and had normal color vision (verified using the Farnsworth–Munsell 100 hue test) and normal or corrected-to-normal acuity. Inclusion was also subject to reliable eye tracking calibration. 
Nine observers (8 females, 1 male, mean age = 29 years, SD = 4 years) took part in the control experiment. None of the control observers subsequently took part in the main experiment. 
Twenty-three observers (14 females, 9 males, mean age = 28 years, SD = 11 years) took part in Experiment 1. 
Nineteen observers (11 females, 8 males, mean age = 26 years, SD = 6 years) took part in Experiment 2. Because Experiment 2 took place after Experiment 1, using the same image sets, new observers were recruited who had not taken part in either the control or main experiments. 
Observers for Experiment 1 and Experiment 2 received a small payment to compensate them for their time. All experiments were conducted with approval from the Ethics Committee of the University of Bradford. 
Stimuli
Experiment 1
Two subsets of 80 images from the McGill Calibrated Colour Image Database ( http://tabby.vision.mcgill.ca/) were used, all outdoor scenes with no people or writing. Within each subset, half of these images were designated as “old” and shown in the encoding phase. During the test phase, the remaining 40 “new” images were shown together with the “old” images and two of four manipulated versions of the “old” images: tone mapped and isoluminant (pair 1), or grayscale and edge filtered (pair 2). See the Procedure section for more details. In total, 160 images were presented during the test phase, in a random order. Each participant performed the experiment twice, each time with a different image set and a different pair of image manipulations. Roughly half the observers saw manipulation pair 1 with image set 1 and manipulation pair 2 with image set 2, the other half vice versa. The four different types of image manipulation were split over two sessions to limit the number of times each old (or manipulated old) image was viewed to three, as during the control experiment (described below), and to limit the time taken to complete the experiment to around 20 min. 
Descriptions of how the manipulated versions of the images were created are given below under the Image manipulations section. 
Experiment 2
In Experiment 2, image manipulations were intended to alter image “gist” of context, as well as saliency. The following two image manipulations were used: 
  1.  
    Inverted “old” images.
  2.  
    Inverted isoluminant “old” images.
Image inversion has been previously shown to impair recognition of scenes (Kelly et al., 2003). The experimental procedure used was identical to Experiment 1, but participants only performed the experiment once, since only two image manipulations were used. Half the participants saw image set 1, the other half image set 2, as in Experiment 1. 
Apparatus
Images were displayed on a calibrated color CRT monitor (768 × 576 pixels, native image resolution). Observers sat 1 m from the screen, eye level with the center of the screen, in a dark room. Images subtended 16.6 (H) × 11.1 (V) degrees of visual angle. 
We used a Cambridge Research Systems (CRS) Video Eye Tracker (50 Hz, resolution 0.1°, accuracy < 0.5°), controlled by the Video Eye Tracker Matlab toolbox integrated with CRS ViSaGe ( http://www.crsltd.com/catalog/), and performed a 25-point calibration for each observer. 
Criteria for fixation were displacements of less than 0.15° and a minimum time of 100 ms. These criteria are comparable to those used in a similar experiment by Foulsham and Underwood (2008). 
Procedure
Experiment 1
We used a “memory encoding” task (Foulsham & Underwood, 2008) that allows observers to scan the scene naturally and avoids biasing them toward any particular type of image features. This task first involves an “encoding” phase, followed by a “test” phase. 
Before starting the experiment, observers were shown a computer presentation explaining the experimental protocol and showing examples of images (none of which were used in the actual experiments) and the response procedure ( Figure 1). 
Figure 1
 
Examples of images used in Experiment 1. Clockwise from top left: Image used in both the encoding phase and as an “old” image in the test phase; a “new” image; a tone-mapped image; an edge-filtered image; a grayscale image; an isoluminant image.
Figure 1
 
Examples of images used in Experiment 1. Clockwise from top left: Image used in both the encoding phase and as an “old” image in the test phase; a “new” image; a tone-mapped image; an edge-filtered image; a grayscale image; an isoluminant image.
After successful eye tracking calibration, the encoding phase of the experiment began: 40 “old” images were presented in random order, each for 3 s and participants were instructed to inspect the images in preparation for a memory test. A small centrally positioned fixation cross was presented on the screen between image presentations, with a gray background of luminance equal to the mean luminance of the image set. The next image presentation was delayed until the observer refixated on the central cross. This ensured that the first fixation for all images was constrained to the center of the image. 
Immediately after the encoding phase was completed, the experimenter indicated the beginning of the test phase. In this phase, 160 images (40 “old,” 40 “new,” 40 either tone mapped “old” or grayscale “old” and 40 either isoluminant “old” or 40 edge filtered “old) were each presented for 3 s in random order, again with constraining central fixation cross between image presentations, as with the encoding phase. Participants indicated if they considered each image to be an “old” image seen during the encoding phase (or previously during the test phase) or a “new” one via a response box. Only responses within 3 s the image was displayed for were accepted (no response was taken to indicate a “new” image as default, on the basis that it had not been immediately recognized by the observer). Observers were indicated to base their decision on image content, not appearance. This was done to encourage them to indicate as “old” a manipulated image that they recognized even though it might look different than during the encoding phase, for example in grayscale rather than color. See Figure 2 for a timeline with sample images. 
Figure 2
 
Timeline showing Experiment 1. Sample images are shown in both the encoding and test phases. In the test phase, the presented image could be either an old (original), new, or manipulated image. Manipulation 1 was either tone mapped or grayscale, manipulation 2 was either isoluminant or edge filtered.
Figure 2
 
Timeline showing Experiment 1. Sample images are shown in both the encoding and test phases. In the test phase, the presented image could be either an old (original), new, or manipulated image. Manipulation 1 was either tone mapped or grayscale, manipulation 2 was either isoluminant or edge filtered.
Movies 1 and 2 also show examples of the experimental procedure. Movie 1 shows the test phase with grayscale and edge-filtered image manipulations, Movie 2 with tone-mapped and isoluminant images. Note that only three trials are demonstrated in each phase and the display times of the trials and intervals are not as in the actual experiment. The phase labels within the movies were not present during the experiment. 
 
Movie 1
 
Example procedures for Experiment 1. Movie 1 on the left shows an example procedure with a test phase containing grayscale, edge filtered and new images. Movie 2 on the right shows an example procedure with a test phase containing tone mapped, isoluminant and new images.
 
Movie 2
 
Example procedures for Experiment 1. Movie 1 on the left shows an example procedure with a test phase containing grayscale, edge filtered and new images. Movie 2 on the right shows an example procedure with a test phase containing tone mapped, isoluminant and new images.
Experiment 2
The procedure for Experiment 2 remained the same as in Experiment 1, with the exception of the different image manipulations. It should be noted that while manipulated images were inverted, images in the encoding phase along with “old” and “new” images in the test phase remained correctly orientated. See Figure 3 for a timeline with sample images and Movie 3 for an example of the experimental procedure. Note that, as in Movies 1 and 2, only three trials are demonstrated in each phase, the display times of the trials and intervals are not as in the actual experiment and the phase labels within the movie were not present during the experiment. 
Figure 3
 
Timeline showing Experiment 2. Sample images are shown in both the encoding and test phases. In the test phase, the presented image could be either an old (original), new, or manipulated image. Manipulated images could be either inverted or inverted isoluminant.
Figure 3
 
Timeline showing Experiment 2. Sample images are shown in both the encoding and test phases. In the test phase, the presented image could be either an old (original), new, or manipulated image. Manipulated images could be either inverted or inverted isoluminant.
 
Movie 3
 
Example procedure for Experiment 2, with a test phase containing inverted, isoluminant inverted and new images.
Control experiment
During the main experiment, we wanted to be able to present both original and manipulated versions of the same images used during the test phase. 
Any effect on scanpaths due to seeing an image more than twice could still be present when manipulated images are used, if the images are not functionally altered to a large degree. Thus, we needed to make sure we would not be confounding our results with any changes in scanpaths due to repeated (more than two) viewings of the same image. To this effect, we conducted a control study comparing scanpaths when viewing the same image up to four times. In the case of this control, the encoding phase was the same as described above, but the test phase consisted of three viewings of each image used in the encoding phase. Image order in the test phase was randomized. 
Image manipulations
The manipulation of images was performed in advance of the experiment using purpose written routines in Matlab and the procedures described below. In all cases, we first converted the RGB image to XYZ color space (CIE, 1932) using the conversion matrix for our experimental monitor, established via calibration using a Cambridge Research Systems ColorCAL colorimeter and vsgDesktop display calibration software (http://www.crsltd.com/catalog/). 
Examples of images used in the experiment are shown in Figure 1
Tone mapping
The tone mapping of images to reduce the dynamic range is of great importance to users of high dynamic range images when attempting to display them on devices with limited dynamic range (Devlin, Chalmers, Wilkie, & Purgathofer, 2002). Tone mappers provide a means of reducing image luminance in a non-linear fashion, with the aim of retaining perceived content. The Drago tone mapping operator (Drago, Myszkowski, Annen, & Chiba, 2003) was chosen to tone map images for this experiment, due to its ease of implementation and widespread use (Reinhard, Ward, Pattanaik, & Debevec, 2006). Maximum luminance for tone-mapped images was set at 10 cd/m2 reducing the dynamic range of the images by a factor of around 10. 
The process involved the following steps: 
  1.  
    Convert XYZ to xyY color space (CIE, 1932).
  2.  
    Find maximum luminance of RGB image and calculate the ratio of initial maximum luminance to required maximum luminance (set at 10 cd/m 2).
  3.  
    Tone map Y (luminance) values of image using the Drago operator using the default parameters as given by Reinhard et al. (2006).
  4.  
    Using the original z and y (chromaticity) values and the new tone-mapped Y (luminance) value, calculate new X and Z values for XYZ color space following standard colorimetric procedures as described in Wyszecki and Stiles (1982).
  5.  
    Convert new XYZ to RGB using the experiment monitor conversion matrix and clamp any out of gamut colors.
Isoluminant
Luminance variation across images provides significant input to the visual system and thus to saliency models (Koch & Ullman, 1985). By removing this information and creating an isoluminant image, we anticipate a significant effect on both real scanpaths and those predicted by saliency models. Chromatic information was not affected by this manipulation. 
The image conversion was performed as follows: 
  1.  
    (1a) Calculate mean Y (luminance) value for image across all pixels.
  2.  
    (1b) Divide individual pixel XYZ values by the Y value for each pixel. This sets all Y (luminance) values to 1.
  3.  
    Multiply new XYZ values (result of step 1b) by the mean original image Y value (result of step 1a), such that all Y values are set to the mean of the original image.
  4.  
    Convert new XYZ to RGB using the experiment monitor conversion matrix; clamp any out of gamut colors.
Grayscale
In order to investigate the relative importance of chromatic information in an image, we implemented a grayscale manipulation, removing color but leaving luminance unchanged, using the following process: 
  1.  
    Convert XYZ to xyY color space (CIE, 1932).
  2.  
    Set x and y (chromaticity) values to the display color space white point (sRGB (D65), x = 0.3127; y = 0.3291; http://www.color.org/sRGB.xalter).
  3.  
    Calculate new X and Z values using the new x and y values.
  4.  
    Convert new XYZ to RGB using the experiment monitor conversion matrix; clamp any out of gamut colors.
Edge filtered
As well as the low spatial frequency chromatic and luminance information in an image, edges are also known to have significant input into the human visual system and saliency models (Koch & Ullman, 1985). We used an image manipulation that increased the contrast of edges within an image, allowing investigation of the effect of edges on scanpaths. In order to perform this manipulation, images were loaded into Adobe Photoshop CS2 and the Accented Edges Filter was applied, using the default parameters. 
Analysis
Scanpath data analysis
A String-Edit (or Levenshtein) distance (see, for example, Brandt & Stark, 1997; Choi, Mosley, & Stark, 1995) and method was chosen to analyze scanpaths. This technique is designed to measure differences between sequences and is widely used for applications such as DNA analysis and spell checking. The analysis technique is described in detail in the work referenced above. However, the key features are as follows. 
Each fixation on a scanpath is assigned a letter label corresponding to the image region it is located within. A 5 × 5 grid was used to define regions in our analysis. Choi et al. (1995) show that the number of regions is not critical and the same number of regions has be used by previous studies (Foulsham & Underwood, 2008). The string of letters, which are in the same order as the temporal order of the fixations, thus generated for each scanpath are subsequently compared. An editing score is calculated by counting the number of manipulations required to make the strings identical to each other. This editing score is normalized to the string length and subtracted from 1, giving a similarity score between 0 and 1. Identical scanpaths score 1 (no edits required), entirely different scanpaths score 0. This method gives higher scores when fixations in the scanpaths being compared are both in similar positions and in a similar temporal order. However, some spatial information is lost due to the splitting of the image area into regions. 
Similarity score baseline
When considering how similar scanpaths are, it is important to know what score would be attributable to chance. For scanpaths of typical (in 3-s viewing) 6 fixations, as observed in our data, and using the 5 × 5 grid as described above, the mean score for the comparison of scanpaths generated from purely random fixations is 0.06. However, as has been discussed in a number of papers (Foulsham & Underwood, 2008; Henderson et al., 2007; Tatler, 2007; Tatler et al., 2005), observer fixations when viewing natural scenes are not distributed evenly across the scene, but biased, typically toward the central area (irrespective of the distribution of image features). Figure 4 shows the distribution of fixations across all experiments, observers, and images in the experiments described in this paper. A total of 69,921 fixations were recorded. 
Figure 4
 
Distribution of fixations made by all observers across all images and experiments (69,921 fixations). The axes show image pixel position and the color map represents the number of fixations at a particular pixel position.
Figure 4
 
Distribution of fixations made by all observers across all images and experiments (69,921 fixations). The axes show image pixel position and the color map represents the number of fixations at a particular pixel position.
Due to the large number of fixations recorded, a “distribution biased random” model for scanpaths can be calculated simply by randomly selecting fixation positions from the recorded observer fixations to form a scanpath. This method inherently incorporates any distribution bias shown by observers. Similar biased random models have been used in previous studies (e.g., Foulsham & Underwood, 2008; Henderson et al., 2007); 11,831 distribution biased random scanpaths were generated, one for each real scanpath measured across all our experiments. Each biased random scanpath was of the same length as the corresponding real scanpath. Pairwise comparisons across all distribution random scanpaths were found to have a mean String-Edit similarity score of 0.16. 
As a further control for observer biases, including specific viewing strategies as well as spatial bias, we compare scanpaths from the encoding phase versus those for the same images (“old”) and entirely different ones (“new”) in the task phase, as described above in the Methods section. Any spatial or strategic bias will be present in both encoding vs. “old” and “new” data as well as in data comparing the scanpaths from the manipulated images. Thus these two groups provide an additional and more useful baseline to that provided by the distribution biased random model. When considering similarity scores between encoding phase and test phase scanpaths, scores for manipulated images can be considered with respect to scores for “old” (identical) and “new” (entirely different) images to better assess any effect of the manipulation. 
Previous studies (Brandt & Stark, 1997; Choi et al., 1995; Foulsham & Underwood, 2008; Mannan et al., 1995) have established that similarity scores for scanpaths recorded from two viewings of the same image are well below the value of 1 that would indicate identical scanpaths, but rather are typically no more than 0.4. Similarity scores for scanpaths arising from two different images are lower than but generally not as low as random scanpaths and are typically around 0.2 to 0.3. 
When comparing scanpath similarity scores and response data for different image groups, unless otherwise stated, one-way repeated-measures ANOVAs were performed and reported, together with appropriate post-hoc tests. For the majority of post-hoc tests, Turkey's HSD is used. In the cases of different sample sizes and unequal variances, Gabriel's pairwise test procedure and the Games–Howell procedure are used, respectively (Field, 2005). In the case of inhomogeneous variances (indicated by Levene's test), Welch's F-ratio is quoted. 
Results
Control experiment
Similarity scores were calculated for scanpaths during the encoding phase compared to each subsequent presentation of the same image during the test phase. Data from scanpaths when the observer's response was incorrect (identifying as new an image that had been seen previously or vice versa) were not used in the analysis, since incorrect responses could be due to the observer being distracted from the viewing task during the trial in question. The similarity scores for both metrics are shown in Figure 5 and the frequency of correct responses for each image presentation is shown in Figure 6
Figure 5
 
Control experiment results, Encoding Phase vs. Test Phase similarity scores. Labels Old 1, Old 2, and Old 3 refer to the first, second, and third presentations of “old” images during the test phase; New to the novel images only shown during the test phase. Mean String-Edit similarities are shown for all 9 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). There is no significant difference between Old 1, Old 2, and Old 3. New is significantly different from both Old 2 and Old 3 ( p < 0.05).
Figure 5
 
Control experiment results, Encoding Phase vs. Test Phase similarity scores. Labels Old 1, Old 2, and Old 3 refer to the first, second, and third presentations of “old” images during the test phase; New to the novel images only shown during the test phase. Mean String-Edit similarities are shown for all 9 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). There is no significant difference between Old 1, Old 2, and Old 3. New is significantly different from both Old 2 and Old 3 ( p < 0.05).
Figure 6
 
Response data for all 9 observers during the Control Experiment test phase. Mean percentage of correct responses is shown. Error bars show standard errors (SEMs). There is no significant difference between the groups.
Figure 6
 
Response data for all 9 observers during the Control Experiment test phase. Mean percentage of correct responses is shown. Error bars show standard errors (SEMs). There is no significant difference between the groups.
Across all observers, repeated-measures ANOVA shows an effect of image group on similarity score, F(3, 32) = 4.51, p = 0.01. We found no significant difference in mean scores between each of the three “old” presentations. The mean score for the New group was found to be significantly different from both Old 2 and Old 3 groups (Turkey's HSD, p < 0.05). A significant difference between New and Old 1 groups was not found. However, the mean score for the Old 1 group (0.29) was closer to that of the Old 2 and Old 3 groups (0.30 and 0.31, respectively) than the New group (0.24) and we might reasonably expect a significant difference if the sample size was larger, as was the case in Experiments 1 and 2 (see below) and also in other studies (e.g., Foulsham & Underwood, 2008). No significant differences were found in the observer response data (Figure 6) between the four groups, with observers correctly identifying images as “new” or “old” in around 90% of cases. 
These results show that there are similarities between scanpaths from the first presentation of an image and subsequent presentations (scores for Encoding vs. Old are higher than Encoding vs. New). Importantly, this similarity does not appear to change significantly on subsequent image presentations. There are no significant differences between Encoding vs. Old 1, Encoding vs. Old2, and Encoding vs. Old 3. Thus we can have some confidence that any differences in similarity scores observed in Experiments 1 and 2 between “old” and manipulated image groups are largely due to the low-level image manipulation rather than a change in viewing behavior with increasing numbers of image presentations. 
Experiment 1
Similarity scores were calculated for scanpaths during the encoding phase compared to “old,” “new,” and each of the manipulated images separately. As in the control experiment, in the analysis that followed we only use scanpaths from “old” and “new” image presentations when images were correctly identified; 85% of “old” image presentations and 87% of “new” image presentations were correctly identified during Experiment 1. 
All scanpaths for manipulated images were used, as due to the image manipulation, the correct observer response was not clearly defined. Perceptually, the manipulated images could be seen as either “old” or “new” (see Experiment 1 response data section). 
Experiment 1 similarity scores are shown in Figure 7. A significant effect of image group was found, F(5, 178) = 4.94, p < 0.01. Mean scores for Old and New groups were significantly different (Gabriel's PTP, p < 0.01). The mean score for the New group was also significantly different from Tone-mapped, Grayscale, and Edge-Filtered groups (Gabriel's PTP, p < 0.05) but not from the Isoluminant group. No significant differences were found between any of the image manipulation groups. Gabriel's pairwise test procedure was used for Experiment 1 post-hoc tests due to the different sample sizes for new and old groups compared to the other groups (Field, 2005), since new and old data were collected for both image sets (see Experiment 1 procedure). 
Figure 7
 
Experiment 1 results, Encoding Phase vs. Test Phase similarity scores. Mean String-Edit similarities are shown for all 23 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Old and New are significantly different ( p < 0.01). New is also significantly different from Tone mapped, Grayscale, and Edge Filtered ( p < 0.05) but not Isoluminant. No other significant differences were found.
Figure 7
 
Experiment 1 results, Encoding Phase vs. Test Phase similarity scores. Mean String-Edit similarities are shown for all 23 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Old and New are significantly different ( p < 0.01). New is also significantly different from Tone mapped, Grayscale, and Edge Filtered ( p < 0.05) but not Isoluminant. No other significant differences were found.
Experiment 1 response data
Averaged over all observers, images, and image groups during the test phase, the mean percentage of images successfully identified was 82%. Identifying an image as being previously seen was taken as a “recognized” response for “old” and manipulated images. In the case of “new” images, “recognized” indicates the observer correctly identified the image as novel. Data for each image group separately are shown in Figure 8. A significant effect of image group was found, F(5, 178) = 3.59, p < 0.01. Post-hoc tests revealed a significant difference for the isoluminant image group only, for which a lower number of images were recognized (mean number recognized = 71%). The number of isoluminant images recognized was significantly different from Old, Edge-Filtered, and New images (Gabriel's PTP, p < 0.05). The result suggests that the isoluminant images were harder to recognize than either non-manipulated images or images that were grayscale, tone mapped, or edge filtered. 
Figure 8
 
Experiment 1 response data for all 23 observers during the test phase. Mean percentage of test phase images recognized is shown. In the case of New images, “recognized” indicates that the observer correctly identified the image as novel. Error bars show standard errors (SEMs). Isoluminant is significantly different from Old, Edge Filtered, and New ( p < 0.05). No other significant differences were found.
Figure 8
 
Experiment 1 response data for all 23 observers during the test phase. Mean percentage of test phase images recognized is shown. In the case of New images, “recognized” indicates that the observer correctly identified the image as novel. Error bars show standard errors (SEMs). Isoluminant is significantly different from Old, Edge Filtered, and New ( p < 0.05). No other significant differences were found.
In order to see if there were any differences in scanpaths from manipulated images that observers recognized as having being seen previously and those that were thought to be new images, we repeated the scanpath analysis on the data from Experiment 1, separating out data from “recognized” images, where the observer indicated that the manipulated image had been seen previously, and “not recognized” images. These data are shown in Figure 9
Figure 9
 
Experiment 1 results, Encoding Phase vs. Test Phase similarity scores, with “recognized” and “not recognized” responses shown separately. Mean String-Edit similarities are shown for all 23 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). New is significantly different from Old, Grayscale (Recognized), and Edges (Not Recognized; p < 0.02). No other significant differences were found.
Figure 9
 
Experiment 1 results, Encoding Phase vs. Test Phase similarity scores, with “recognized” and “not recognized” responses shown separately. Mean String-Edit similarities are shown for all 23 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). New is significantly different from Old, Grayscale (Recognized), and Edges (Not Recognized; p < 0.02). No other significant differences were found.
No significant differences were found between the Not Recognized and Recognized groups for any of the image types. The results for the Recognized groups were similar to those above (Experiment 1, pre-splitting of “recognized” and “not recognized” images): Old and New group means were found to be significantly different (Games–Howell, p < 0.01). The mean score for New images was also found to be significantly different from the means of Grayscale (Recognized) and Edges (Recognized; Games–Howell, p < 0.02). The Games–Howell procedure is used here as the variances of Recognized and Not Recognized groups were found, using Levene's test, to be unequal. 
Experiment 2
Fixation coordinates making up scanpaths from inverted and inverted isoluminant images were first reinverted on the vertical axis before comparison with encoding scanpaths. In all other respects, analysis was identical to Experiment 1. 
Similarity scores for Experiment 2 are shown in Figure 10. A significant effect of image group was found, F(3, 72) = 3.43, p < 0.05. The mean score for the Old group was found to be significantly different from both the New and Inverted Isoluminant groups (Turkey's HSD, p < 0.05). However, the mean score for the Inverted image group was not found to be significantly different from any of the other groups. 
Figure 10
 
Experiment 2 results, Encoding Phase vs. Test Phase similarity scores. Mean String-Edit similarities are shown for all 18 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Old is significantly different from both New and Inverted Isoluminant ( p < 0.05). No other significant differences were found.
Figure 10
 
Experiment 2 results, Encoding Phase vs. Test Phase similarity scores. Mean String-Edit similarities are shown for all 18 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Old is significantly different from both New and Inverted Isoluminant ( p < 0.05). No other significant differences were found.
Response data for Experiment 2 were also recorded and are shown in Figure 11. As with Experiment 1, response data were averaged over all observers and images during the test phase. The frequency of correct responses for “old and “new” images was similar to that recorded in the main experiment (83% and 91%, respectively). Inverted images were recognized slightly, although not significantly, less often than “old” images (78% recognition). Inverted isoluminant images were recognized least often (63% recognition) and this response rate was significantly different from both “old” and “new” images (Turkey's HSD, p < 0.01). 
Figure 11
 
Experiment 2 response data for all 18 observers during the test phase. Mean percentages of test phase images recognized are shown. In the case of New images, “recognized” indicates that the observer correctly identified the image as novel. Error bars show standard errors (SEMs). Inverted Isoluminant is significantly different from Old and New ( p < 0.01). No other significant differences were found.
Figure 11
 
Experiment 2 response data for all 18 observers during the test phase. Mean percentages of test phase images recognized are shown. In the case of New images, “recognized” indicates that the observer correctly identified the image as novel. Error bars show standard errors (SEMs). Inverted Isoluminant is significantly different from Old and New ( p < 0.01). No other significant differences were found.
Saliency model predictions
Itti and Koch's (2000) saliency model can make scanpath predictions for a given image, based on a generated saliency map and an inhibition of return mechanism. A scanpath is generated by appending fixation points in order of saliency, highest first. The model is available as a Matlab toolbox, implemented by Walther and Koch (2006; http://www.saliencytoolbox.net/). Previous studies that have compared saliency models with actual measured eye movements have found relatively low correlations. Foulsham and Underwood (2008) found similarities between predicted and real scanpaths significantly lower than that found for human scanpaths arising from two different viewings of the same image. Similarly, when comparing saliency and measured human attention maps, Ouerhani, von Wartburg, Hügli, and Müri (2004) found correlations of around 0.3. It should be noted that Ouerhani et al. (2004) did not use a String-Edit metric, but identical fixations would yield a correlation score of 1 using their metric, and fixations with no correlation would score 0. 
In order to make a comparison between the predictions of Itti and Koch's saliency model and our data, we generated saliency maps and scanpaths using the Saliency Toolbox for all our images (including manipulated versions). We then calculated scanpath similarity scores comparing the predicted scanpaths (first 6 fixations) for the encoding phase images to all the other groups of images. These data, as mean scores across all images, are shown in Figure 12. Note that images from the Old and Inverted groups are identical to the encoding phase images and as such achieve a maximum score of 1, as the model predicted scanpaths are identical. 
Figure 12
 
Similarity scores for scanpaths (6 fixations) generated by Itti and Koch's saliency model (Encoding Phase images vs. Test Phase images). Mean String-Edit similarities are shown for all images. Error bars show standard errors (SEMs). Note that the variance is due to image variability only, as no real observer scanpaths are considered. The lower dashed line shows the mean score for random scanpaths; the upper dashed line shows the mean score for “distribution biased” random scanpaths. All group means are significantly different.
Figure 12
 
Similarity scores for scanpaths (6 fixations) generated by Itti and Koch's saliency model (Encoding Phase images vs. Test Phase images). Mean String-Edit similarities are shown for all images. Error bars show standard errors (SEMs). Note that the variance is due to image variability only, as no real observer scanpaths are considered. The lower dashed line shows the mean score for random scanpaths; the upper dashed line shows the mean score for “distribution biased” random scanpaths. All group means are significantly different.
As might be expected, the lowest similarity scores were for Encoding vs. New, being only slightly above the mean score for randomly generated scanpaths (with no spatial distribution bias), and slightly below the mean score for “distribution biased” random scanpaths: Encoding vs. New = 0.11, mean biased random scanpath score = 0.16, mean random scanpath score = 0.06. The fact that this score is lower than for spatially biased random scanpaths might be expected if the model does not include a central bias for fixation distribution, since in the biased case subsequent fixations are more likely to lie in central grid areas (see Analysis section). 
Tone-Mapped images had the highest similarity scores, which should be expected since the tone mapping operator aims only to change the maximum image luminance, while keeping other attributes such as contrast and chromaticity unchanged. 
Isoluminant, Grayscale, and Edge-Filtered images all scored between New and Tone-Mapped images, with Encoding vs. Isoluminant producing the most dissimilar scanpaths of the manipulated images. 
Comparisons were also made between the predicted and observed scanpaths for each image group. Mean scores across all images and observers are shown in Figure 13. Overall, scores are very low, below scores for distribution biased randomly generated pairs of scanpaths (shown by the dashed line in Figure 13). Again, this seems likely due, at least in part, to a lack of central fixation bias in Itti and Koch's model. The Inverted image group differed significantly from the Old and Grayscale groups (Gabriel's PTP, p < 0.05), with a slightly higher mean similarity score of 0.072 compared to 0.055 and 0.053 for Old and Grayscale, respectively. No other significant differences were found between groups. 
Figure 13
 
Similarity scores for scanpaths (6 fixations) generated by Itti and Koch's saliency model vs. the first 6 fixations from measured observer scanpaths. Mean String-Edit similarities are shown for all images and all observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Note that the vertical axis extends to 0.2 rather than 1 as in previous figures showing similarity scores. Inverted is significantly different from Old and Grayscale ( p < 0.05). No other significant differences were found.
Figure 13
 
Similarity scores for scanpaths (6 fixations) generated by Itti and Koch's saliency model vs. the first 6 fixations from measured observer scanpaths. Mean String-Edit similarities are shown for all images and all observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Note that the vertical axis extends to 0.2 rather than 1 as in previous figures showing similarity scores. Inverted is significantly different from Old and Grayscale ( p < 0.05). No other significant differences were found.
Discussion
Effects of image manipulations and inversion
Results from our control experiment, Experiment 1, and Experiment 2 all show that scanpaths followed by individual observers, when free viewing natural images, are more similar across two viewings of the same image than across viewings of two different images. This result is in agreement with several previous studies such as Foulsham and Underwood (2008) and Mannan et al. (1995). Additionally, via the control experiment, we found that repeated viewings of “old” images do not seem to result in significant changes in the similarity of observer's scanpaths to those from the first (encoding) viewing. This second result allows us a greater flexibility in further experiments, such as Experiments 1 and 2, since the results of multiple viewings of manipulated images are hopefully not confounded by any effect of more than 2 presentations. It is however interesting to note that while not as high as “old” images, similarity scores for “new” vs. encoding images are higher than the mean score of the distribution biased random model. While greater similarity (compared to random scanpaths) between scanpaths from different images may be attributed to a spatial bias, such as the central bias discussed in the Analysis section, this should be accounted for by our distribution biased random model, since it inherently contains any spatial bias in our data set. The fact that an increased similarity remains seems to suggest that there may be some form of bias or image independent pattern to fixation selection that is not purely distribution based. Possibilities such as a preference for making subsequent fixations in particular directions or patterns could conceivably account for this result, although further investigation would be required to ascertain the nature of any such bias if they exist. 
When comparing encoding phase and test phase scanpaths from Experiment 1, no significant difference was found between the similarity scores for scanpaths elicited by manipulated images and those from the original “old” images ( Figure 7). This implies that the manipulated images are not for the most part functionally significantly different from the original images for the generation of scanpaths, at least for the test phase of our memory encoding task. However, the results of Experiment 2 show a significant difference between Old and Inverted Isoluminant groups, while the Inverted group was not significantly different from Old. Additionally, the Isoluminant group in Experiment 1 was also the only group that was not found to be significantly different from the New image group (although it was also not significantly different from Old, with a mean score falling between New and Old). These results indicate that the removal of luminance information appears to have the largest effect of any of our image manipulations (including inversion) on scanpaths. Interestingly, the changes are largest when inversion and isoluminance are combined. The effect of luminance in Experiment 1 appears to be weak and any result based only on the lack of significant difference between groups must be treated with caution. It can be argued that all that can be said from this result is that none of the manipulations seem to affect fixation selection (despite influencing image recognition). However, an effect of luminance does seem to be supported by the results of Experiment 2, since inversion plus isoluminance created a greater difference in scanpaths than inversion alone. 
Einhäuser and König (2003) found that although luminance contrasts were higher at the fixation points selected by observer's free-viewing natural scenes, moderate modifications to luminance did not affect fixation selection and they concluded that high luminance contrast does not causally attract overt attention and does not measurably contribute to a saliency map for human attention. If Einhäuser and König's result is taken to be generally true, then the effect of isoluminance in the present study may be due to high-level cognitive control rather than saliency. However, it is difficult to attribute cause to effect based purely on the results of the present study since the methods employed cannot easily disentangle saliency and cognitive control mechanisms. 
Observer recognition and response data
Observers responded to each image presentation during the task phase, indicating whether or not they recognized the image from the encoding phase. Experiment 1 response data show that only isoluminant images were significantly harder to recognize and the frequency of successful identification was lower than for other groups. This effect was enhanced by the addition of image inversion in Experiment 2: recognition frequency was lowered further and scanpath similarity scores were not distinguishable from “new” images. 
Manipulation of image luminance could affect fixation selection via either low- or high-level mechanisms (or both), so we cannot be sure if the effect of isoluminance on scanpath similarities is due to changes to saliency or top-down selection. However, scene inversion does not change saliency but appears to enhance both the disruption to scene recognition and scanpath repeatability, when combined with the removal of luminance information. This result supports the idea that, even for the short viewing times used in this study, cognitive selection may be an overriding factor in fixation selection, as suggested by Nyström and Holmqvist (2008). 
It is not clear why scene inversion does not seem to have much effect without the addition of the isoluminant manipulation. It may be that our experiments are not sufficiently sensitive to easily pick up the small differences in scanpath similarity elicited by our image manipulations unless the manipulations are positively combined. One conclusion that can be drawn from the fact that differences in scanpath similarity are small is that viewing behavior is apparently fairly robust to such manipulations. 
Comparison to saliency model predictions
Itti and Koch's (2000) saliency model predicts that modifications to luminance will have the greatest effect on observer scanpaths (see Figure 12). This prediction has some support from our results, as noted above. However, the scanpaths predicted by the model were not found to be very similar to observed scanpaths for any of the image groups. Low correlation between saliency model predictions and observed fixation selection has also been found in a number of other studies, such as Foulsham and Underwood (2008), Henderson et al. (2007), and Underwood, Foulsham, and Humphrey (2009). 
It is interesting to note that although image inversion did not have a noticeable affect on scanpaths when comparing “encoding” vs. “old” and “encoding” vs. “inverted,” inversion did seem to have a small effect when comparing observer's recorded scanpaths to those predicted by Itti and Koch's saliency model. The Inverted group was the only group that was significantly different to Old, with a higher mean similarity score for the Inverted group. Image inversion appears to illicit scanpaths that are more similar to those that might be expected by considering only bottom-up factors, indicating a possible disruption of top-down deployment of attention. The effect is quite small and the scanpath similarity scores in this case are still very low, in fact lower than those of the “biased distribution random” model. This however should not be surprising if we consider the lack on any central bias in the saliency model implementation used. Of course, other features of the model, perhaps even the central premise of selection on the basis of saliency, may also contribute to these low scores, if they also do not properly represent the processes of human fixation selection. However, we are principally concerned with the differences between our manipulation groups when scanpaths are compared to those predicted purely from image features, with the “old” and “new” groups providing a baseline. Thus the apparent limitations of the saliency model and subsequent low similarity scores should not preclude drawing any conclusions from our data, although it may limit or mask any real effects. 
It should also be considered that fixation positions indicated by low- and high-level factors often correlate strongly (regions of high saliency are often also regions of informative interest to an observer, see for example Henderson et al., 2007; Tatler, 2007), so it could be argued that some model of top-down fixation selection might produce the same higher similarity to scanpaths from inverted images as seen with Itti and Koch's saliency model. 
What is certainly clear is that the reliance of the computer vision community on saliency models, as discussed in the Introduction section, does appear to be a limiting factor in applications that require prediction of fixation selection. While Itti and Koch's saliency model has been shown to predict fixation locations better than random models (Foulsham & Underwood, 2008), it clearly cannot account for all the factors that appear to drive eye movements and therefore can only have limited success. New models that take into account factors other than bottom-up saliency are required to improve applications such as those in computer graphics. 
Summary
There is considerable difficulty in trying to determine the causes of fixation selection, particularly in the absence of a particular task. This study uses a novel paradigm, of measuring the effect on real observer scanpaths of controlled changes to image properties and orientation, in order to further study the mechanisms behind the deployment of overt visual attention. 
We find that of manipulations to chromaticity, luminance, spatial contrast, and orientation, only luminance has a measurable effect on scanpaths during a short free viewing of natural scenes. Removal of luminance information results in scanpaths that are less similar to those of a previous viewing of the same image than when the image is not altered. Isoluminant images were also significantly more difficult for observers to recognize, suggesting high-level processing was affected. 
While image inversion did not produce a measurable effect on scanpaths when compared to a previous viewing, it did make the scanpaths more similar to those predicted by Itti and Koch's (2000) saliency model. This is not necessarily support for image saliency as a significant factor in fixation selection, however, since high-level cognitive factors could also conceivably account for this effect due to the correlation between image features and regions of interest. The saliency model was generally poor at predicting scanpaths. 
In the case of the addition of image inversion to the isoluminant image manipulation, disruption to scene gist is increased over normally orientated isoluminant images (as evidenced by lowered image recognition by observers), while saliency is unchanged; a reduction of scanpath repeatability and image recognition is seen. However, no effect is seen from image inversion alone. 
The results of this study suggest that while saliency most likely has some input into fixation selection, when the order of fixations is taken into account at least, other factors seem to be more important. The repeatability of scanpaths is certainly not as strongly affected by manipulations to low-level image features as saliency models predict, although luminance does appear to have a greater influence than other image features. Disruption to both low-level features and scene gist together causes the greatest change to scanpaths, suggesting that both mechanisms play some part and future models of fixation selection should combine both in order to improve performance. 
Acknowledgments
We would like to thank Tom Foulsham for his help in understanding his previous work and Laurence Itti, Christof Koch, and Dirk Walther for making their saliency model available. We also thank Ben Tatler and another anonymous reviewer for their helpful comments on earlier versions of this manuscript. This research was supported by the Engineering and Physical Sciences Research Council (EPSRC, UK) via Grant Number EP/D032008/1 to MB. 
Commercial relationships: none. 
Corresponding author: Glen Harding. 
Email: g.harding1@bradford.ac.uk. 
Address: University of Bradford, Bradford, West Yorkshire, BD7 1DP, UK. 
References
Althoff R. R. Cohen N. J. (1999). Eye-movement-based memory effect: A reprocessing effect in face perception. Journal of Experimental Psychology: Learning, Memory and Cognition, 25, 997–1010. [PubMed] [CrossRef]
Brandt S. A. Stark L. W. (1997). Spontaneous eye movements during visual imagery reflect the content of the visual scene. Journal of Cognitive Neuroscience, 9, 27–38. [CrossRef] [PubMed]
Buswell G. (1935). How people look at pictures. Oxford, UK: Chicago Press.
Butko N. J. Zhang L. Cottrell G. W. Movellan J. R. (2008). Visual saliency model for robot cameras, international conference on robotics and automation.
Castelhano M. S. Mack M. Henderson J. M. (2009). Viewing task influences eye movement control during active scene perception. Journal of Vision, 9, (3):6, 1–15, http://journalofvision.org/9/3/6/, doi:10.1167/9.3.6. [PubMed] [Article] [CrossRef] [PubMed]
Chen X. Zelinsky G. J. (2006). Real-world visual search is dominated by top-down guidance. Visual Research, 46, 4118–4133. [PubMed]
Choi Y. S. Mosley A. D. Stark L. W. (1995). String editing analysis of human visual search. Optometry and Vision Science, 72, 439–451. [PubMed] [CrossRef] [PubMed]
(1932). Commission Internationale de l'Eclairage proceedings, 1931,
Cleveland P. (2009). Style based automated graphic layouts. Design Studies, 31, 3–25. [CrossRef]
Debattista K. Sundstedt V. Pereira F. Chalmers A. (2005). Selective parallel rendering for high-fidelity graphicsn Proceedings of Theory and Practice of Computer Graphics 2005 (pp. 59–66). Aire‐la‐Ville, Genève (Geneva), Switzerland: The Eurographics Association.), Switzerland: The Eurographics Association..
Devlin K. Chalmers A. Wilkie A. Purgathofer W. (2002). STAR: Tone reproduction and physically based spectral rendering.
Drago F. Myszkowski K. Annen T. Chiba N. (2003). Adaptive logarithmic mapping for displaying high contrast scenes.
Einhäuser W. König P. (2003). Does luminance-contrast contribute to a saliency map for overt visual attention? European Journal of Neuroscience, 17, 1089–1097. [PubMed] [CrossRef] [PubMed]
Field A. (2005). Discovering statistics using SPSS. London, UK: SAGE Publications.
Foulsham T. Underwood G. (2007). How does the purpose of inspection influence the potency of visual salience in scene perception? Perception, 36, 1123–1138. [PubMed] [CrossRef] [PubMed]
Foulsham T. Underwood G. (2008). What can saliency models predict about eye movements Spatial and sequential aspects of fixations during encoding and recognition. Journal of Vision, 8, (2):6, 1–17, http://journalofvision.org/8/2/6/, doi:10.1167/8.2.6. [PubMed] [Article] [CrossRef] [PubMed]
Henderson J. M. Brockmole J. R. Castelhano M. S. Mack M. van P. G. Fischer, M. H. Murray, W. S. Hill R. L. (2007). Visual saliency does not account for eye movements during visual search in real-world scenes. Eye movements: A window on the mind and brain. (pp. 537–562). Amsterdam, Netherlands: Elsevier.
Itti L. Koch C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40, 1489–1506. [PubMed] [CrossRef] [PubMed]
Itti L. Koch C. (2001). Computational modelling of visual attention. Nature Reviews, Neuroscience, 2, 194–203. [PubMed] [CrossRef]
Kelly T. A. Chun M. M. Chua K. (2003). Effects of scene inversion on change detection of targets matched for visual salience. Journal of Vision, 3, (1):1, 1–5, http://journalofvision.org/3/1/1/, doi:10.1167/3.1.1. [PubMed] [Article] [CrossRef] [PubMed]
Kienzle W. Wichmann F. A. Schlkopf B. Franz M. O. (2007). A nonparametric approach to bottom-up visual saliency. Neural information processing systems.
Koch C. Ullman S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4, 219–227. [PubMed] [PubMed]
Laeng B. Teodorescu D. S. (2002). Eye scanpaths during visual imagery reenact those of perception of the same visual scene. Cognitive Science, 26, 207–231. [CrossRef]
Mannan S. Ruddock K. H. Wooding D. S. (1995). Automatic control of saccadic eye movements made in visual inspection of briefly presented 2-D images. Spatial Vision, 9, 363–386. [PubMed] [CrossRef] [PubMed]
Noton D. Stark L. (1971). Scanpaths in saccadic eye movements while viewing and recognizing patterns. Vision Research, 11, 929–942. [PubMed] [CrossRef] [PubMed]
Nyström M. Holmqvist K. (2008). Semantic override of low-level features in image viewing—Both initially and overall. Journal of Eye Movement Research, 2, 1–11.
Ouerhani N. von Wartburg R. Hügli H. Müri R. (2004). Empirical validation of the saliency-based model of visual attention. Electronic Letters on Computer Vision and Image Analysis, 3, 13–24.
Parkhurst D. Law K. Niebur E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42, 107–123. [PubMed] [CrossRef] [PubMed]
Reinagel P. Zador A. M. (1999). Natural scene statistics at the centre of gaze. Network, 10, 341–350. [PubMed] [CrossRef] [PubMed]
Reinhard E. Ward G. J. Pattanaik S. N. Debevec P. (2006). High dynamic range imaging. San Francisco, USA: Morgan Kaufmann.
Shore D. I. Klein R. M. (2000). The effects of scene inversion on change blindness. Journal of General Psychology, 127, 27–43. [PubMed] [CrossRef] [PubMed]
Stark L. Ellis S. R. (1981). Scanpaths revisited: Cognitive models direct active looking. Paper presented at the Eye movements: Cognition and Visual Perception.
Stirk J. A. Underwood G. (2007). Low-level visual saliency does not predict change detection in natural scenes. Journal of Vision, 7, (10):3, 1–10, http://journalofvision.org/7/10/3/, doi:10.1167/7.10.3. [PubMed] [Article] [CrossRef] [PubMed]
Tatler B. W. (2007). The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision, 7, (14):4, 1–17, http://journalofvision.org/7/14/4/, doi:10.1167/7.14.4. [PubMed] [Article] [CrossRef] [PubMed]
Tatler B. W. Baddeley R. J. Gilchrist I. D. (2005). Visual correlates of fixation selection: Effects of scale and time. Vision Research, 45, 643–659. [PubMed] [CrossRef] [PubMed]
Torralba A. Oliva A. Castelhano M. S. Henderson J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Psychological Review, 113, 766–786. [PubMed] [CrossRef] [PubMed]
Underwood G. Foulsham T. (2006). Visual saliency and semantic incongruency influence eye movements when inspecting pictures. Quarterly Journal of Experimental Psychology, 59, 1931–1949. [PubMed] [CrossRef]
Underwood G. Foulsham T. Humphrey K. (2009). Saliency and scan patterns in the inspection of real-world scenes: Eye movements during encoding and recognition [ext-link ext-link-type="uri" xlink:href="http://wwwinformaworldcom/101080/13506280902771278">Electronic version/ext-link>]. Visual Cognition, 17, 812–834. [CrossRef]
Underwood G. Foulsham T. van Loon E. Humphreys L. Bloyce J. (2006). Eye movements during scene inspection: A test of the saliency map hypothesis. European Journal of Cognitive Psychology, 18, 321–343. [CrossRef]
Underwood G. Templeman E. Lamming L. Foulsham T. (2008). Is attention necessary for object identification Evidence from eye movements during the inspection of real-world scenes. Consciousness and Cognition, 7, 159–170. [PubMed] [CrossRef]
Walther D. Koch C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19, 1395–1407. [PubMed] [CrossRef] [PubMed]
Wyszecki G. Stiles W. (1982). Color science: Concepts and methods, quantitative data and formulae. New York: John Wiley and Sons.
Yarbus A. L. (1967). Eye movements and vision. New York: Plenum.
Figure 1
 
Examples of images used in Experiment 1. Clockwise from top left: Image used in both the encoding phase and as an “old” image in the test phase; a “new” image; a tone-mapped image; an edge-filtered image; a grayscale image; an isoluminant image.
Figure 1
 
Examples of images used in Experiment 1. Clockwise from top left: Image used in both the encoding phase and as an “old” image in the test phase; a “new” image; a tone-mapped image; an edge-filtered image; a grayscale image; an isoluminant image.
Figure 2
 
Timeline showing Experiment 1. Sample images are shown in both the encoding and test phases. In the test phase, the presented image could be either an old (original), new, or manipulated image. Manipulation 1 was either tone mapped or grayscale, manipulation 2 was either isoluminant or edge filtered.
Figure 2
 
Timeline showing Experiment 1. Sample images are shown in both the encoding and test phases. In the test phase, the presented image could be either an old (original), new, or manipulated image. Manipulation 1 was either tone mapped or grayscale, manipulation 2 was either isoluminant or edge filtered.
Figure 3
 
Timeline showing Experiment 2. Sample images are shown in both the encoding and test phases. In the test phase, the presented image could be either an old (original), new, or manipulated image. Manipulated images could be either inverted or inverted isoluminant.
Figure 3
 
Timeline showing Experiment 2. Sample images are shown in both the encoding and test phases. In the test phase, the presented image could be either an old (original), new, or manipulated image. Manipulated images could be either inverted or inverted isoluminant.
Figure 4
 
Distribution of fixations made by all observers across all images and experiments (69,921 fixations). The axes show image pixel position and the color map represents the number of fixations at a particular pixel position.
Figure 4
 
Distribution of fixations made by all observers across all images and experiments (69,921 fixations). The axes show image pixel position and the color map represents the number of fixations at a particular pixel position.
Figure 5
 
Control experiment results, Encoding Phase vs. Test Phase similarity scores. Labels Old 1, Old 2, and Old 3 refer to the first, second, and third presentations of “old” images during the test phase; New to the novel images only shown during the test phase. Mean String-Edit similarities are shown for all 9 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). There is no significant difference between Old 1, Old 2, and Old 3. New is significantly different from both Old 2 and Old 3 ( p < 0.05).
Figure 5
 
Control experiment results, Encoding Phase vs. Test Phase similarity scores. Labels Old 1, Old 2, and Old 3 refer to the first, second, and third presentations of “old” images during the test phase; New to the novel images only shown during the test phase. Mean String-Edit similarities are shown for all 9 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). There is no significant difference between Old 1, Old 2, and Old 3. New is significantly different from both Old 2 and Old 3 ( p < 0.05).
Figure 6
 
Response data for all 9 observers during the Control Experiment test phase. Mean percentage of correct responses is shown. Error bars show standard errors (SEMs). There is no significant difference between the groups.
Figure 6
 
Response data for all 9 observers during the Control Experiment test phase. Mean percentage of correct responses is shown. Error bars show standard errors (SEMs). There is no significant difference between the groups.
Figure 7
 
Experiment 1 results, Encoding Phase vs. Test Phase similarity scores. Mean String-Edit similarities are shown for all 23 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Old and New are significantly different ( p < 0.01). New is also significantly different from Tone mapped, Grayscale, and Edge Filtered ( p < 0.05) but not Isoluminant. No other significant differences were found.
Figure 7
 
Experiment 1 results, Encoding Phase vs. Test Phase similarity scores. Mean String-Edit similarities are shown for all 23 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Old and New are significantly different ( p < 0.01). New is also significantly different from Tone mapped, Grayscale, and Edge Filtered ( p < 0.05) but not Isoluminant. No other significant differences were found.
Figure 8
 
Experiment 1 response data for all 23 observers during the test phase. Mean percentage of test phase images recognized is shown. In the case of New images, “recognized” indicates that the observer correctly identified the image as novel. Error bars show standard errors (SEMs). Isoluminant is significantly different from Old, Edge Filtered, and New ( p < 0.05). No other significant differences were found.
Figure 8
 
Experiment 1 response data for all 23 observers during the test phase. Mean percentage of test phase images recognized is shown. In the case of New images, “recognized” indicates that the observer correctly identified the image as novel. Error bars show standard errors (SEMs). Isoluminant is significantly different from Old, Edge Filtered, and New ( p < 0.05). No other significant differences were found.
Figure 9
 
Experiment 1 results, Encoding Phase vs. Test Phase similarity scores, with “recognized” and “not recognized” responses shown separately. Mean String-Edit similarities are shown for all 23 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). New is significantly different from Old, Grayscale (Recognized), and Edges (Not Recognized; p < 0.02). No other significant differences were found.
Figure 9
 
Experiment 1 results, Encoding Phase vs. Test Phase similarity scores, with “recognized” and “not recognized” responses shown separately. Mean String-Edit similarities are shown for all 23 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). New is significantly different from Old, Grayscale (Recognized), and Edges (Not Recognized; p < 0.02). No other significant differences were found.
Figure 10
 
Experiment 2 results, Encoding Phase vs. Test Phase similarity scores. Mean String-Edit similarities are shown for all 18 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Old is significantly different from both New and Inverted Isoluminant ( p < 0.05). No other significant differences were found.
Figure 10
 
Experiment 2 results, Encoding Phase vs. Test Phase similarity scores. Mean String-Edit similarities are shown for all 18 observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Old is significantly different from both New and Inverted Isoluminant ( p < 0.05). No other significant differences were found.
Figure 11
 
Experiment 2 response data for all 18 observers during the test phase. Mean percentages of test phase images recognized are shown. In the case of New images, “recognized” indicates that the observer correctly identified the image as novel. Error bars show standard errors (SEMs). Inverted Isoluminant is significantly different from Old and New ( p < 0.01). No other significant differences were found.
Figure 11
 
Experiment 2 response data for all 18 observers during the test phase. Mean percentages of test phase images recognized are shown. In the case of New images, “recognized” indicates that the observer correctly identified the image as novel. Error bars show standard errors (SEMs). Inverted Isoluminant is significantly different from Old and New ( p < 0.01). No other significant differences were found.
Figure 12
 
Similarity scores for scanpaths (6 fixations) generated by Itti and Koch's saliency model (Encoding Phase images vs. Test Phase images). Mean String-Edit similarities are shown for all images. Error bars show standard errors (SEMs). Note that the variance is due to image variability only, as no real observer scanpaths are considered. The lower dashed line shows the mean score for random scanpaths; the upper dashed line shows the mean score for “distribution biased” random scanpaths. All group means are significantly different.
Figure 12
 
Similarity scores for scanpaths (6 fixations) generated by Itti and Koch's saliency model (Encoding Phase images vs. Test Phase images). Mean String-Edit similarities are shown for all images. Error bars show standard errors (SEMs). Note that the variance is due to image variability only, as no real observer scanpaths are considered. The lower dashed line shows the mean score for random scanpaths; the upper dashed line shows the mean score for “distribution biased” random scanpaths. All group means are significantly different.
Figure 13
 
Similarity scores for scanpaths (6 fixations) generated by Itti and Koch's saliency model vs. the first 6 fixations from measured observer scanpaths. Mean String-Edit similarities are shown for all images and all observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Note that the vertical axis extends to 0.2 rather than 1 as in previous figures showing similarity scores. Inverted is significantly different from Old and Grayscale ( p < 0.05). No other significant differences were found.
Figure 13
 
Similarity scores for scanpaths (6 fixations) generated by Itti and Koch's saliency model vs. the first 6 fixations from measured observer scanpaths. Mean String-Edit similarities are shown for all images and all observers. The dashed line shows the mean score for distribution biased random scanpaths. Error bars show standard errors (SEMs). Note that the vertical axis extends to 0.2 rather than 1 as in previous figures showing similarity scores. Inverted is significantly different from Old and Grayscale ( p < 0.05). No other significant differences were found.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×