November 2011
Volume 11, Issue 13
Free
Article  |   November 2011
Viewing behavior and the impact of low-level image properties across repeated presentations of complex scenes
Author Affiliations
Journal of Vision November 2011, Vol.11, 26. doi:https://doi.org/10.1167/11.13.26
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Kai Kaspar, Peter König; Viewing behavior and the impact of low-level image properties across repeated presentations of complex scenes. Journal of Vision 2011;11(13):26. https://doi.org/10.1167/11.13.26.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Studies on bottom-up mechanisms in human overt attention support the significance of basic image features for fixation behavior on visual scenes. In this context, a decisive question has been neglected so far: How stable is the impact of basic image features on overt attention across repeated image observation? To answer this question, two eye-tracking studies were conducted in which 79 subjects were repeatedly exposed to several types of visual scenes differing in gist and complexity. Upon repeated presentations, viewing behavior changed significantly. Subjects neither performed independent scanning eye movements nor scrutinized complementary image regions but tended to view largely overlapping image regions, but this overlap significantly decreased over time. Importantly, subjects did not uncouple their scanning pathways substantially from basic image features. In contrast, the effect of image type on feature–fixation correlations was much bigger than the effect of memory-mediated scene familiarity. Moreover, feature–fixation correlations were moderated by actual saccade length, and this phenomenon remained constant across repeated viewings. We also demonstrated that this saccade length effect was not an exclusive within-subject phenomenon. We conclude that the present results bridge a substantial gap in attention research and are important for future research and modeling processes of human overt attention. Additionally, we advise considering interindividual differences in viewing behavior.

Introduction
In the last two decades, the study of human overt attention under natural conditions has become an important topic of human attention research (for reviews, see Einhäuser & König, 2010; Henderson, 2003). Traditionally, attention is categorized as either covert or overt. In physiological experiments, the position of receptive fields has to be controlled tightly, and paradigms based on central fixations abound. Hence, these are devoid of eye movements and attention shifts covertly only. In psychophysical experiments, eye movements make overt shifts of attention easily noticeable (Yarbus, 1967). In the last few years, techniques for recording eye movements have made tremendous advances. Indeed, fully mobile recording under natural conditions is possible (e.g., Einhäuser et al., 2007; Jovancevic, Sullivan, & Hayhoe, 2006; Schneider et al., 2006; Schuhmann et al., 2008; Seagull & Xiao, 2001). Although the boundary constraints for experimental design are very different, overt and covert attention have been shown to be tightly related (Hoffman & Subramaniam, 1995; McPeek, Maljkovich, & Nakayama, 1999). Hence, we will use the investigation of overt attention as a marker for attentional processes in general. 
Given the central role of overt attention in sensory processing, the guidance of eye movements has moved to the center of research interest. Competing hypotheses have been put forward, either emphasizing the role of low-level, stimulus-dependent or high-level, task-dependent properties. 
The stimulus-dependent properties were investigated especially in the last decade by several studies addressing the impact of basic image features of natural scenes on fixation behavior by comparing spatial image fixation probability. In this sense, the bottom-up selection of image locations involves fast stimulus-driven mechanisms such as a compulsory look at abrupt occurring stimuli (Posner & Synder, 1975; Yantis & Jonides, 1984) or to unique features (Treisman & Gelade, 1980). Overall, many studies on bottom-up mechanisms in human overt attention support the significance of basic image features. 
Mannan, Ruddock, and Wooding showed (1997) that high spatial frequency content and edge density affect fixation selection. Reinagel and Zador (1999) found that there was higher local spatial contrast and that pixels at fixated locations tended to be less correlated with surrounding pixels in comparison to corresponding correlations of randomly chosen locations. In a study of Parkhurst and Niebur (2003), these findings were replicated, and additionally, they found measures of local contrast to be correlated with selected fixation locations. In addition, Krieger, Rentschler, Hauske, Schill, and Zetsche (2000) evaluated fixated image regions on natural as well as artificial scenes and provided evidence that luminance contrast, edges, and further image properties play an important role in fixation selection in humans. Einhäuser and König (2003) demonstrated that luminance contrast is correlated to fixation points, but variations of this feature within the normal range do not change the fixation pattern. Only at larger variations do luminance contrast values causally affect overt attention. In a more detailed study, Acik, Onat, Schumann, Einhäuser, and König (2009) provided evidence that the influence of luminance and texture contrasts on fixated locations depends on both the analyzed spatial frequency of image properties like luminance contrast and the type of image: Even on nature images, the effects of luminance contrast were revealed when images were low-pass filtered before correlating luminance contrast with fixation probability. In addition to that, luminance contrast in urban or artificial scenes compared to scenes of nature similar to those used by Einhäuser and König permitted the discrimination of fixated locations from non-fixated ones. We have to conclude that luminance contrast is correlated to other relevant image features and by itself contributes to the selection of fixation points depending on image type. 
The role of many other features in guidance of eye movements has also been investigated. Early on fixation probability on natural scenes is linked to high spatial frequency edge information (Baddeley & Tatler, 2006; Mannan, Ruddock, & Wooding, 1996). Furthermore, color was found to be a feature guiding attention on modified natural scenes (Frey, König, & Einhäuser, 2007), as well as to be highly predictive for fixation selection on diverse visual scenes, whereas the strength of colors' influence on attention significantly depends on image type (Frey & König, 2008). Betz, Kietzmann, Wilming, and König (2010) investigated the impact of a large set of local image features such as color and luminance contrasts, edges, and saturation on fixation probability in the area of diverse web pages. In combination, the set of image features allows prediction of selected fixation points with high accuracy. Consequently, besides the numerous studies providing clear evidence of luminance's impact on eye guidance, color, edges, and saturation are also relevant image properties for fixation behavior. In this context, several authors (cf. Itti & Koch, 2000; Koch & Ullman, 1985; Torralba, 2003) proposed another approach to investigate the impact of scene statistics on viewing behavior: Image features such as contrast, color, or edge information are used to generate a map of feature salience for each dimension subsequently combined into a single saliency map that indicated image regions of interest that should attract attention. The predictions of such a saliency model then can be compared with empirical eye-tracking data. Although empirical evaluations of saliency models show that more fixations fall into regions predicted by the model than would be expected by chance (e.g., Foulsham & Underwood, 2008), literature also suggests that only a small part of the variance in human fixation selection can be explained by saliency models in general (cf. Parkhurst, Law, & Niebur, 2002). Nonetheless, the ongoing prominence of conspicuity-based approaches (see the special issue of Visual Cognition; Tatler, 2009) derives from the bulk of studies providing clear evidence for a significant involvement of basic image features on gaze control. However, several authors emphasize the correlational nature of these results (e.g., Einhäuser & König, 2003; Einhäuser, Spain, & Perona, 2008; Foulsham & Underwood, 2008; Henderson, 2003; Henderson, Brockmole, Castelhano, & Mack, 2007; Tatler, 2007), since it is not clear whether salient image regions causally drive attention or if other bottom-up or top-down factors guide attention to objects or image regions that coincide with visual saliency. 
In summary, literature suggests that basic image features affect human viewing behavior in terms of bottom-up processes contributing to human visual overt attention. Thereby, the type of visual scenes being investigated statistically moderates the effect sizes, and the additional impact of the analyzed spatial frequencies also has to be considered. Interestingly, Tatler, Baddeley, and Vincent (2006; replicated by Acik, Sarway, Schultze-Kraft, Onat, & König, 2010) showed that the basic image features that discriminated between where observers fixated and where they did not varied considerably with the length of the preceding saccade. The value of a certain image feature at a fixated location was higher when a short saccade preceded, indicating that the correlation between image features and fixation selection is stronger for small amplitudes. 
In contrast to studies that primarily focus on the impact of low-level image properties on gaze behavior, several studies investigated the influence of high-level features and top-down effects on human attention. Similar to the diversity of bottom-up effects on overt attention being investigated, very different aspects of top-down mechanisms have been examined so far. For example, Soto, Heinke, Humphreys, and Blanco (2005) found evidence that a stimulus matching the content of working memory can early and involuntarily attract attention. Navalpakkam and Itti (2006) investigated the granularity of top-down mechanisms and showed that such mechanisms seem to be fine-grained, because top-down signals in their study not only specified a certain feature dimension but also carried fine-grained information that specified the relevant feature interval. Wallis and Bülthoff (2000) showed that drivers and passengers in a virtual environment differ in their sensitivity to changes in a scene, indicating that the actual behavioral mode affects attention allocation. Land and Lee (1994) used eye-tracking techniques to investigate drivers' gaze direction during drives along a road. Pieters and Warlop (1999) investigated how time pressure and actual motivation affect viewing behavior. Their results revealed that consumers adapted to time pressure during a brand choice task by accelerating visual scanning but that high task motivation in contrast to low motivation led to a deceleration of information acquisition. In a recent study by Kaspar and König (2011), observers' motivational disposition as well as their subjective evaluation of scenes' interestingness were exposed to be strong top-down mediators of visual attention. Moreover, a lot of studies (e.g., Nelson, Cottrell, Movellan, & Sereno, 2004; Rothkopf, Ballard, & Hayhoe, 2007; Triesch, Ballard, Hayhoe, & Sullivan, 2003) provided evidence for a task-dependent scanning of images. For example, Einhäuser, Rutishauser, and Koch (2008) showed that task demands are able to override sensory-driven saliency in visual scenes almost immediately. 
In addition to studies focusing either on bottom-up or primarily on top-down influences, the interaction between top-down and bottom-up mechanisms on eye movement behavior has been picked out as a central theme by many researchers (Rutishauser & Koch, 2007; Tatler, Baddeley, & Gilchrist, 2005; Torralba, 2003). In a recent study by Betz et al. (2010), the course of overt attention emerged as task-dependent, but the pattern of feature–fixation correlations remained constant across different tasks. The authors concluded that task-dependent differences in fixation behavior are not mediated by a reweighting of features in the bottom-up hierarchy. Moreover, semantic interpretations of an image can easily override the impact of basic image features on viewing behavior, as shown by Nyström and Holmqvist (2008). They found no differences in contrast and edge density between fixated locations and non-fixated control locations when semantically important image regions were reduced in low-level signal strength. In contrast to what saliency models would predict, no shift in fixation density away from those regions was observable. Such a result reveals the limitation of an exclusively saliency-based approach, which is normally inflexible and typically cannot deal with task-relevant image content being not salient (cf. Jovancevic et al., 2006). Kollmorgen, Nortmann, Schröder, and König (2010) recently provided evidence that low-level visual features, task-dependent information, and spatial viewing biases all make important contributions to the course of human overt attention and should, therefore, be considered in the context of attention research. However, the exact principles that underlie those interactions between high-level features' influences and top-down effects have not been completely unfolded so far (cf. Einhäuser, Spain et al., 2008; Henderson et al., 2007). 
In the midst of this dispute, a decisive aspect has been neglected so far. The dynamics of the interaction as it evolves across multiple exposures to identical visual scenes has not been explored until now. Most studies comprise non-repeating stimuli to exclude potential effects of memory; hence, visual stimuli are presented commonly in a non-recurring fashion. Indeed, some studies addressed specific aspects of the impact of repeated presentations on overt attention (e.g., Althoff & Cohen, 1999; Brockmole & Henderson, 2006; Chun & Jiang, 1998; Foulsham & Underwood, 2008; Harding & Bloj, 2010; Hollingworth & Henderson, 2000; Underwood, Foulsham, & Humphrey, 2009), but even these studies did not investigate the fundamental issue of eye movement behavior when identical complex scenes are repeatedly observed under identical circumstances. Although Foulsham and Underwood (2008) as well as Underwood et al. (2009) repeatedly presented identical stimuli, viewing conditions changed between presentation runs (encoding session versus recognition task). In a recent study by Kaspar and König (2011), a paradigm was introduced in which the actual task (free viewing) did not change. Subjects observed complex visual scenes that were presented repeatedly in a pseudorandomized order. Recorded eye movements revealed that the attention focus successively became local, expressed by an increase in fixation duration, a decrease in saccade length and frequency, as well as a decrease of a single subject's fixation distribution over images measured by entropy analysis. Moreover, intersubject variance of fixation distributions increased over time. Interestingly, these effects were significantly influenced by the subject's motivational state, indicating a top-down mechanism. 
However, this study did not investigate if subjects scanned different image regions across repeated observations and if the change of attention focus went along with a successively alleviated correlation between image features and fixation behavior. In other words, did subjects uncouple their scanning pathways more and more from basic image features when images became familiar due to repeated presentations? At least changes in common eye movement parameters as well as retrospective verbal reports of subjects indicated that subjects decided deliberately to observe scene locations at later presentations that had not been attended before. The results of the study therefore suggested that attention became successively top-down driven across repeated exposure to identical images. However, it is not clear whether these changes are paralleled by a decreasing impact of bottom-up mechanisms that are commonly operationalized by the correlation between basic image features and fixation probability as described above. 
In this context of a time-dependent impact of low-level image properties on eye movements, some previous studies investigated whether significant changes in the involvement of image features arise over time. These studies focused on one-time observations of images and analyzed whether the impact of image features changes over the course of several fixations during viewing, but results are contradictory. Parkhurst et al. (2002) found evidence for a changing impact of image features over scene viewing. Feature dependence of eye movements was highest for fixations that immediately followed stimulus onset, and stimulus dependence reached an asymptotic level at later fixations. In contrast, Tatler et al. (2005) did not find such a change in feature involvement and attributed this contradictory result to differences in the applied analysis techniques. Independent of this fact, neither study indicates whether the correlation between image features and fixation behavior would change if the identical stimuli were presented several times. Foulsham and Underwood (2008) presented complex interior and exterior scenes in an encoding phase as well as in a recognition test afterward in which additional new scenes were randomly shown. They found that saliency at fixated locations only decreased slightly over multiple fixations but stated that this effect derived wholly from one high value on the second fixation and that their study provided no clear evidence for a change in bottom-up allocation of attention over scene observation. Moreover, familiarity of images did not significantly interact with this time-dependent impact of image features on fixation behavior. In a second study, Underwood et al. (2009) applied a similar experimental design and found that the sequence of fixations during the encoding phase matches well with that made when observing the images a second time in a recognition test. Scan patterns found in a second recognition test 1 week later were even similar to those found in the first test, but overall scan patterns did not fit those predicted by a saliency map model. 
The present study was conducted to further scrutinize the correlation between basic image features and viewing behavior across multiple image presentation. Due to the above-described findings of Acik et al. (2009) and Parkhurst and Niebur (2003), showing that the moderating effect of image category as well as the analyzed spatial frequencies, respectively, have to be considered, we investigated eye movements on several types of complex scenes with respect to 22 low-level image features. In two experiments, we first analyzed whether subjects in fact changed their viewing behavior and observed image regions in later presentations as different from those observed during the initial presentation. If it were the case that subjects looked at different image locations across repeated presentations, it would be the central question to which degree basic image features still remain important for eye guidance. 
Thus, we were interested in three research questions addressing (1) the fixation distributions across multiple image observations, (2) the main effects of repeated image exposure as well as image type on feature–fixation correlations, and (3) the interplay between saccade length and feature–fixation correlations. In more detail, we hypothesized: 
  1.  
    Subjects change their viewing behavior and observe different image locations across repeated presentations expressed by a decreasing congruency of fixation distribution maps, which is paralleled by a decrease in fixated image regions and an increase of short saccades.
  2.  
    On the one hand, we expected images to become more familiar to subjects over time, and hence, this should be expressed by a decreasing impact of basic image features on fixation behavior. On the other hand, the impact of basic image features on viewing behavior should also significantly differ between image categories, as suggested by previous findings (Acik et al., 2010, 2009).
  3.  
    We asked whether the observation of Tatler, Baddeley, and Vincent (2006) that values of basic image features at fixated locations are higher following shorter saccades than those following longer saccades remains across repeated presentation. Moreover, is the dependency between saccade length and feature values at fixated locations also a between-subject phenomenon?
In Study 1, we completely reanalyzed the eye-tracking data of a recent study by Kaspar and König (2011) with respect to these research questions. In Study 2, we recorded 34 new subjects by applying the identical experimental setting and procedure as in Study 1 but using a new image base. Thus, results of Study 1 should be validated, on the one hand, and results should be generalized to a broader set of ecologically valid visual scenes, on the other hand. 
Study 1
Methods
Participants
Forty-five university students (12 males) who were naive to the purpose of the study participated. The average age was 24.2 years (18–48). All participants had normal or corrected-to-normal visual acuity. All subjects signed a written consent form to participate in the experiment. The experiment was conducted in compliance with the Declaration of Helsinki as well as national and institutional guidelines for experiments with human subjects. 
Stimuli
Forty-eight colored images from four different image categories served as stimuli (Figures 1A1D). “Nature” contained twelve images that were part of the McGill Calibrated Colour Image Database (Olmos & Kingdom, 2004) depicting natural environments like open landscapes, forests, or flowers, with an absence of any man-made objects. “Fractals” consisted of twelve software-generated fractal pictures that had second-order statistics similar to real-world images. They were taken from three different web databases: Elena's Fractal Gallery, Maria's Fractal Explorer Gallery, and Chaotic N-Space Network (all reached via “IFD: Internet Fractal Database”; http://www.ba.infn.it/~zito/project/gallerie.html). “Urbans” consisted of twelve images showing, for example, house exteriors, streets, and vehicles. These pictures were taken with a high-resolution camera (Nikon D2X) at public places in Switzerland and were unfamiliar to participants. Scenes of both “nature” and “urban” categories were free of people or writing. The fourth category contained twelve pink noise images produced as described before (Einhäuser et al., 2006; Kayser, Nielsen, & Logothetis, 2006). Briefly, all original images of the above-described categories (nature, urban, and fractal) served as base images. In a first step, they were Fourier transformed (each color plane separately). Then, the power spectrum over all images was averaged, and phase values were substituted by random values. Finally, the average power spectrum and the modified phase spectrum were combined by means of an inverse Fourier transform. As a result, this procedure preserved the second-order statistics, but the resulting images were devoid of structures. This made objects and similar assemblies undetectable in the pink noise images. 
Figure 1
 
Examples of image categories: (A) nature, (B) urban, (C) fractal, (D) pink noise.
Figure 1
 
Examples of image categories: (A) nature, (B) urban, (C) fractal, (D) pink noise.
Apparatuses
Stimuli were presented on a 21-inch Samsung SyncMaster 1100 DF 2004 CRT monitor (Samsung Electronics, Seoul, South Korea). Screen distance was about 80 cm and display resolution was chosen to fit the image resolution of 1280 × 960 pixels, while the refresh rate was 85 Hz. No headrest was used to facilitate normal viewing behavior. The computer running the experiment was connected to the host computer (Pentium 4; Dell, Round Rock, TX, USA) with the EyeLink software via a local network. 
Participants' eye movements were recorded by an EyeLink II system (SR Research, Ontario, Canada). It uses infrared pupil tracking at a sampling rate of 500 Hz and compensates for head movements. To calibrate, participants made saccades to a grid of 13 fixation spots on the screen, which appeared one by one in a random order. Tracking of the eye giving the lower validation error started as soon as this value was below 0.35°. To control for slow drifts in measured eye movements, after each stimulus presentation, a fixation spot appeared in the middle of the screen. In cases of an error being larger than 1°, calibration and validation were repeated. Fixations and saccades were detected and parameterized automatically by the eye tracker. Saccade detection was based on three default measures: eye movement of at least 0.1°, with a velocity of at least 30°/s and an acceleration of at least 8000°/s. After saccade onset, minimal saccade velocity was 25°/s. These values had to be sustained for at least 4 ms. The first fixation of each trial was excluded from analysis since its localization was an artifact of the preceding fixation spot used for drift correction. 
Eye-tracking procedure
Before the eye-tracking session, all participants first had to pass the Ishihara Test for Color Blindness (Ishihara, 2005) to ensure that no red–green or other color deficiency influences participants' sensitiveness to basic image features. The room was darkened during the eye-tracking session, in which each participant saw consecutively five blocks of all 48 images. Images were presented in pseudorandomized order within a block. Presentation duration of images was 6 s. Participants were introduced to “observe the images” to elicit free-viewing behavior. A short 5-min break after the third presentation block maintained participants' alertness and avoided potential fatigue. After the break, tracking was started again with calibration and validation. After the eye-tracking session, participants subsequently saw all 48 images once more in a random order (without eye tracking) and rated the degree of interestingness of each image on a 5-point scale. However, this measurement and an additional psychometric test were conducted to investigate a research question different from the present (cf. Kaspar & König, 2011). These data consequently were not considered in the present study. Finally, participants gave written reports about the impressions they had during the repeated presentations of images. At the end, the participants were informed about the purposes and details of the experiment. 
Data analysis
In the scope of our hypotheses, eye-tracking data were analyzed regarding (1) the distribution of fixations across repeated image presentations and (2) the correlation between basic image features and fixation likelihood via AUC values. Significance level was always 5%, but in cases of multiple testing, conclusions about significance referred to a Bonferroni-adjusted alpha level. 
Fixation distribution analysis
To investigate whether subjects observed always the same image regions across repeated image exposure or whether fixated locations between presentation runs were very different, we first computed the congruency of fixation density maps (FDMs). First, the FDM of a specific image was produced separately for all five presentation runs and subjects. For that purpose, the fixation distribution map of subject s viewing image i in presentation run p was convolved with a Gaussian kernel. The full-width at half-maximum (FWHM) of the Gaussian kernel defining the size of the patch was set to 3° of visual angle. This patch size models an attention focus that size was chosen to provide an appropriate granularity of image partition being independent of image structure. We validated the results at the end by also using a visual angle of 1° and 2°. In fact, patch size does not affect relations between FDMs; only absolute correlation values depend on it. 
In a second step, the correlation between the FDM of image i in presentation run p and the FDM of the same image j in the subsequent presentation run p + 1 was calculated with standard MATLAB (MathWorks) function according to 
r = m n ( I m n I ) ( J m n J ) ( m n ( I m n I ) 2 ) ( m n ( J m n J ) 2 ) ,
(1)
where m is the number of rows and n depicts the number of columns constituting the image. 
In the next step, correlations between FDMs of single images were averaged across all images of a certain image category. For that purpose, all correlations were normalized by applying Fisher's z transformation to allow averaging as suggested by several authors (e.g., Silver & Dunlap, 1987; Strube, 1988), though this transformation substitutes an underestimation when averaging correlation coefficients by a small overestimation when averaging corresponding z values (Hunter & Schmidt, 2004). However, this upward bias is of no importance in the present case, since we focused on the relation between correlations of different FDMs, and absolute correlation coefficients depend on selected visual angle as described above. Moreover, Fisher's z transformation yields normal distributed data and rescales the correlation coefficient into an interval scale (Thorndike, 2007) and hence allows parametric testing. 
After z-transformed correlations had been averaged across all images of the same category, resulting mean z correlations were first tested against zero by computing one-sample t-tests to check for significant congruencies of FDMs across repeated observation. 
Next, to test whether congruency of FDMs decreased across repeated presentations as predicted in Hypothesis 1, mean z-transformed correlations were statistically compared by applying univariate analyses of variance (ANOVA) separately for each image category. We report partial eta-squared value (proportion of variance explained) as a measure of effect size (Cohen, 1988). Finally, post-hoc Bonferroni-adjusted t-tests were calculated to reveal significant differences between factor levels. 
Due to the spatial bias, zero correlations between two FDMs were not expected. Consequently, we computed a reference value by means of the mean correlation between two FDMs that can be expected when both FDMs are based on fixations that are randomly sampled from all fixations made on images of a category. The number of randomly sampled fixations was chosen to fit the grand mean fixation number on a single image of a specific category in the actual presentation run. The two resulting FDMs were then correlated and subsequently Fisher z-transformed. This procedure was done 1,000 times, and finally, we calculated the average z correlation as the reference level. This is marked in corresponding Figures 3 and 14
In order to answer whether the expected decrease in the congruency of FDMs is paralleled by a less extensive image observation at later presentation runs, we did three additional measures for a complete picture: First, we analyzed the number of image regions observed across repeated presentations. As suggested by the findings of Smith, Hopkins, and Squire (2006), familiar images are scanned less extensively. For that purpose, each image was divided into 16 equal-sized regions to form a 4 × 4 grid (cf. Smith et al., 2006, p. 11305). Second, we analyzed the temporal course of a potential left–right bias of fixations across repeated presentations. Finally, we computed a frequency analysis of saccade lengths to clarify whether images are also scanned with shorter visual steps at later presentations. For that purpose, we computed the saccade length as the mean Euclidean distance between two fixation points, whereas recorded fixation spots beyond image size were excluded to prevent artifacts. All saccades were assigned to one of 20 categories, each containing saccades of a specific length (category 1: saccade length ≤ 1° visual angle; category 2: saccade length >1° and ≤2°; …; category 20: saccade length >19°). All three measures were done with respect to image category and actual presentation run. 
Feature–fixation correlations
In order to check Hypothesis 2, we assessed the correlation between image features and fixation likelihood as described in previous studies (Acik et al., 2010, 2009; Betz et al., 2010; Frey et al., 2007; Tatler et al., 2005; Zhang, Tong, Marks, Shan, & Cottrell, 2008). The correlations between image features and fixation likelihood were computed by means of area under the Receiver Operating Characteristic (ROC) curve (AUC). In short, first feature values were calculated in the Derrington–Krauskopf–Lennie (DKL) color space (Derrington, Krauskopf, & Lennie, 1984) for all images. As features' luminance values contrast, red–green and blue–yellow color contrasts, edge values, and saturation were taken into account. 
Additionally, second-order contrasts (i.e., texture contrasts) of luminance, red–green, and blue–yellow were computed, so that overall 22 basic image features were analyzed. Details of the methods for computing features have been described in detail in Betz et al. (2010). In short, color as well as luminance contrasts of the region surrounding a fixation point is defined as the standard deviation of luminance values in an isotropic patch that is centered on that fixation, whereas regarding luminance, it is additionally normalized by the mean luminance of the whole image. 
By varying the size of the isotropic patch, contrasts at different spatial frequencies are observable. In the present study, the full-width at half-maximum (FWHM) of the Gaussian kernel defining the size of the patch was set to 1°, 2°, and 3° of visual angle. According to that, three contrast maps for luminance (lumc), three for red–green (rgc), and three for blue–yellow (byc) contrasts were computed, differing in patch size. In addition to that, on these three contrast maps, second-order contrasts were calculated by means of the same method but enlarging the kernel sizes to the respective double FWHM (1°→2°, 2°→4°, 3°→6°). Thereby, luminance (lumtc), red–green (rgtc), and blue–yellow (bytc) texture contrasts were obtainable. 
Furthermore, edge values (sob) on the image's luminance channel were calculated by using Sobel operators. The Sobel procedure was applied to the original image and a resized version of its luminance channel by using bicubic interpolation (resize factor = 0.5) was obtained. Edge maps were finally smoothed with a Gaussian kernel of 3-pixel FWHM, because the eye tracker has no pixel accuracy. 
Saturation (sat) was computed as the Euclidean distance between the color value of a certain pixel and the center of the isoluminant plane in the DKL color space. The resulting saturation map was also smoothed with two different Gaussian kernels of 5- and 25-pixel FWHM, respectively, in order to observe saturation at different spatial resolutions. 
After image features had been calculated, correlations between image features and fixation likelihood were computed by means of area under the Receiver Operating Characteristic (ROC) curve measurement (AUC). The ROC curve was obtained by separating actual fixations, being those fixations on the current image and the so-called control fixations made on all other images of the same category as the image in question. The advantage of this approach to select control fixations is the prevention of a central bias as the tendency to fixate central regions of a scene more than the periphery, independent of the image's content or starting point of viewing (see Tatler, 2007; Zhang et al., 2008). The AUC then quantifies how well fixated and non-fixated locations can be discriminated by their saliency (i.e., AUC values quantify the extent to which a certain feature discriminates between actual and control fixations). A value of 0.5 indicates that a feature is uncorrelated (random discrimination), higher values imply the feature's ability to discriminate between control and actual fixations better than by chance, and values below 0.5 indicate that the feature predicts worse than chance. Upper and lower bounds are 1.0 and 0, respectively. AUC values were calculated for each participant separately to allow statistical inference by means of multivariate analysis of variance. Each image feature was separately correlated with fixation likelihood and the resulting correlation was expressed via AUC value. 
In order to test potential effects of all factors, a doubly multivariate analysis of variance (MANOVA) for repeated measures was computed first in order to consider correlations between all dependent variables. In the case of significant multivariate effects, univariate ANOVAs of the same factorial design were subsequently computed to reveal whether effects would be also present on a single feature's level. This two-step procedure is the most common method of analyzing and interpreting the results of a MANOVA (Tabachnick & Fidell, 2007). 
Results
Fixation distribution analysis (Hypothesis 1)
Did subjects observe always the same image regions across repeated image exposure, or did they fixate very different image locations between presentation runs? Figure 2 depicts examples of raw data from single subjects to visualize the time-dependent changes in viewing behavior. The congruency of FDMs across presentation runs was calculated by means of correlations between FDMs to answer this initial question. The mean z-transformed correlation (averaged across subjects and images of the same category) between FDMs of two presentation runs are depicted in Table 1. In general, the congruency of two FDMs was maximal when a given presentation run was correlated with the following run and declines slightly with increasing distance of blocks. As the complete set of correlations is not statistically independent, we focused on potential changes from one viewing to the next in the subsequent analysis. 
Figure 2
 
Examples of fixation distributions and corresponding saccades of single subjects depending on image type and repeated presentation.
Figure 2
 
Examples of fixation distributions and corresponding saccades of single subjects depending on image type and repeated presentation.
Table 1
 
Matrix of mean Fischer's z-transformed intercorrelations (averaged across subjects) between FDMs of all presentation runs. Fisher's z-transformed correlations are signatures of the congruency of two FDMs. Correlations between two consecutive presentations are marked in bold.
Table 1
 
Matrix of mean Fischer's z-transformed intercorrelations (averaged across subjects) between FDMs of all presentation runs. Fisher's z-transformed correlations are signatures of the congruency of two FDMs. Correlations between two consecutive presentations are marked in bold.
Presentation Nature scenes Urban scenes Fractal scenes Pink noise scenes
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
1 1.0 1.0 1.0 1.0
2 0.53 1.0 0.55 1.0 0.58 1.0 0.46 1.0
3 0.46 0.47 1.0 0.41 0.43 1.0 0.48 0.48 1.0 0.37 0.62 1.0
4 0.40 0.41 0.41 1.0 0.34 0.34 0.35 1.0 0.43 0.43 0.42 1.0 0.33 0.47 0.61 1.0
5 0.36 0.38 0.37 0.42 1.0 0.32 0.31 0.33 0.34 1.0 0.40 0.45 0.41 0.45 1.0 0.34 0.77 0.53 0.49 1.0
The mean z-transformed correlation between FDMs of two successive presentation runs were averaged across subjects and images of the same category and tested against zero. Since we had five presentation runs, four z-transformed correlations between FDMs of successive presentation runs resulted for each of the four image categories (presentation runs 1 vs. 2, 2 vs. 3, 3 vs. 4, and 4 vs. 5). Overall, we computed 16 one-sample t-tests, and the adjusted alpha level was therefore α adj = 0.003. All mean correlations significantly differed from zero [all t(44) > 3.196; all p < 0.003] with one exception [t(44) = 2.656; p = 0.011], as marked in Figure 3. Thus, there are significant positive correlations between fixation distributions on images across repeated observations. As a reference value, we also calculated the mean correlation between two FDMs that can be expected when both FDMs base on fixations being randomly sampled from all fixations made on images of a certain category (Figure 3). These correlations are non-zero but much smaller than the observed correlations between the actual fixation distributions. Hence, upon repeated presentations, subjects neither performed independent scanning eye movements (zero correlation) nor scrutinized complementary image regions (negative correlations) but tended to view largely overlapping image regions. 
Figure 3
 
Congruency of FDMs. Fisher's z-transformed correlations between FDMs of two consecutive presentation runs were averaged across subjects. Mean z-transformed correlations differing significantly from zero are indicated by asterisks [***p < 0.003 (adjusted alpha level); *p < 0.05]. Vertical lines above bars indicate standard error of measurement. Red horizontal lines within bars indicate the reference value; this is the mean correlation between two FDMs that can be expected when both FDMs are based on fixations being randomly sampled from all fixations made on images of a certain category.
Figure 3
 
Congruency of FDMs. Fisher's z-transformed correlations between FDMs of two consecutive presentation runs were averaged across subjects. Mean z-transformed correlations differing significantly from zero are indicated by asterisks [***p < 0.003 (adjusted alpha level); *p < 0.05]. Vertical lines above bars indicate standard error of measurement. Red horizontal lines within bars indicate the reference value; this is the mean correlation between two FDMs that can be expected when both FDMs are based on fixations being randomly sampled from all fixations made on images of a certain category.
To check whether the amount of this congruency between FDMs decreased significantly across repeated presentations, we additionally computed ANOVA for repeated measures (Greenhouse–Geisser applied) to compare z-transformed correlations between FDMs. Due to multiple testing, the alpha level was adjusted to α adj = 0.013. As predicted by Hypothesis 1, we found a significant decrease of FDM correlations across repeated presentations for images of type “urban” [F(2.286, 100.596) = 17.687; p < 0.001; η p 2 = 0.287], “fractal” [F(1.962, 86.323) = 5.0; p < 0.01; η p 2 = 0.102], as well as “nature” [F(1.871, 82.307) = 3.828; p = 0.028; η p 2 = 0.080]. For image category “pink noise,” we did not find a change in FDM correlations [F(1.126, 49.526) = 0.772; p = 0.398; η p 2 = 0.017]. Results of Bonferroni-adjusted t-tests to test differences between factor levels are marked in Figure 3. As depicted, the congruency between FDMs was significantly reduced between later presentation runs, as predicted in Hypothesis 1. 
Results were validated by using FDMs produced by convolving FDMs with a Gaussian kernel of 1° visual angle FWHM and 2° visual angle, respectively. The above result pattern was completely replicated; only the absolute values of z-transformed correlations changed accordingly but also remained significantly different from zero. Consequently, selection of the kernel size is negligible. 
To sum up, fixated locations at later presentation runs were positively correlated to previously observed image regions. However, the congruency of FDMs was moderate indicating that subjects did not always observe the same image regions. Instead, they successively attended to image regions at later presentations that had not been observed previously. 
To answer whether the decreasing congruency of FDMs is paralleled by a less extensive scanning of images, we analyzed (1) the number of image regions explored across repeated presentations, (2) the temporal course of the potential left–right bias of fixations, and (3) the distribution of saccade length across presentation runs. 
(1) Initially, we analyzed the number of image regions observed across repeated presentation runs. For that purpose, each image was divided into 16 equal-sized regions to form a 4 × 4 grid. The number of fixated regions was calculated separately for each image and each presentation run on the subject level. Afterward, the number of fixated regions was averaged across all images belonging to the same category. A 4 × 5 (image category × presentation run) ANOVA for repeated measures (Greenhouse–Geisser applied) was computed for statistical testing. Kolmogorov–Smirnov tests revealed that normally distributed data were present in all cells [all p > 0.510]. The number of fixated regions decreased in later presentations [F(2.152, 94,686) = 20.703; p < 0.001; η p 2 = 0.320]. Furthermore, we observed differences between image categories [F(1.597, 70.265) = 24.725; p < 0.001; η p 2 = 0.360], with most observed image regions on urban scenes and a minimum of fixated regions on pink noise images. Finally, a significant interaction was revealed [F(9.060, 398.645) = 3.661; p < 0.001; η p 2 = 0.077]. Due to the interaction, we computed one-way ANOVAs separately for each image category to test the effect of repeated presentation [all F ≥ 6.320; all p < 0.001; η p 2 ≥ 0.126]. For nature, fractal, and pink noise images, a significant decrease in fixated image regions was found as well as an intermittent increase from third to fourth run as depicted by Figure 4. In contrast, on urban images, the number of fixated regions decreased continuously across presentation runs. Pairwise comparisons between presentation runs were calculated by post-hoc Bonferroni-adjusted t-tests, and results are marked in Figure 4. In summary, we found a consistent small decrease in the number of fixated regions across repeated presentations. 
Figure 4
 
Number of fixated image regions for each image category. Images were divided into a 4 × 4 grid of equal-sized regions and number of fixated regions was counted separately for each image and presentation run. The mean across images is depicted. Vertical lines indicate standard error of the mean. Significant differences between presentation runs (Bonferroni-adjusted t-tests) are marked.
Figure 4
 
Number of fixated image regions for each image category. Images were divided into a 4 × 4 grid of equal-sized regions and number of fixated regions was counted separately for each image and presentation run. The mean across images is depicted. Vertical lines indicate standard error of the mean. Significant differences between presentation runs (Bonferroni-adjusted t-tests) are marked.
(2) Second, we analyzed the time course of a potential left–right bias of fixations across repeated presentations by counting the number of fixations on the left and right sides of the images, respectively. A 4 × 5 × 2 (image category × presentation run × image side) ANOVA for repeated measures was calculated. Fixation numbers were normally distributed in all cells [Kolmogorov–Smirnov tests: all p > 0.280]. Besides significant differences in the fixation number between image types [F(1.571, 69.126) = 115.491; p < 0.001; η p 2 = 0.724] and a significant decrease across repeated presentations [F(2.379, 104.664) = 7.665; p < 0.001; η p 2 = 0.148], both factors interacted significantly [F(7.573, 333.191) = 4.054; p < 0.001; η p 2 = 0.084] (see Figure 5, left side). The number of fixations made on nature, urban, and fractal images continuously decreased across repeated presentations as revealed by one-way ANOVAs [all F ≥ 3.995; all p < 0.01; all η p 2 ≥ 0.083], but on pink noise, no changes were observable [F(2.927, 128.771) = 1.427; p = 0.238; η p 2 = 0.031]. Bonferroni-adjusted t-tests for pairwise comparisons revealed no significant difference between presentation runs on nature images due to lower statistical power [all p ≥ 0.064], but significant differences were found on urban images [run 1 versus 3/4: both p < 0.001; 2 vs. 5: p < 0.05; 3 vs. 5: p < 0.05], as well as on fractal images [1 vs. 2/3/4/5; 2 vs. 5]. With respect to the mean fixation number, on images averaged across presentation runs, significant differences between all image categories were found [all p < 0.001], except between nature and fractal images [p = 1.000]. The highest number of fixations was found on urban images and the lowest on pink noise images. 
Figure 5
 
Mean number of fixations on images of a certain category depending on the actual presentation run (left side) and mean number of fixations separated for left and right image sides (right side). Images were divided into a 4 × 4 grid of equal-sized regions. Vertical lines indicate standard error of the mean.
Figure 5
 
Mean number of fixations on images of a certain category depending on the actual presentation run (left side) and mean number of fixations separated for left and right image sides (right side). Images were divided into a 4 × 4 grid of equal-sized regions. Vertical lines indicate standard error of the mean.
Moreover, the 4 × 5 × 2 ANOVA revealed a left–right bias that was significantly moderated by the image type [interaction effect: F(2.098, 92.328) = 49.072; p < 0.001; η p 2 = 0.527]. To scrutinize this, 5 × 2 ANOVAs (presentation run × image side) were subsequently calculated for each image type with the focus on a potential left–right bias and an additional interaction with the actual presentation run. Results showed a significant left bias on urban scenes [F(1, 44) = 44.880; p < 0.001; η p 2 = 0.505] (Figure 5), which decreased across presentation runs [interaction effect: F(3.605, 158.628) = 3.623; p < 0.01; η p 2 = 0.076]. On pink noise images, a right bias was found [F(1, 44) = 6.043; p < 0.05; η p 2 = 0.121] (Figure 5), which increased across repeated presentations [interaction effect: F(2.847, 125.267) = 2.828; p < 0.05; η p 2 = 0.060]. On fractals, a significant right bias was found [F(1, 44) = 15.342; p < 0.001; η p 2 = 0.259], which remained constant across repeated presentations [interaction effect: p = 1.000]. On nature images, no significant left–right bias was found [p = 0.108] and no interaction with the actual presentation run [p = 0.718]. 
(3) Finally, we did an analysis of saccade lengths to clarify whether images were scanned with shorter visual steps at later presentations. For that purpose, all saccades were assigned to one of 20 categories, each containing saccades of a specific length (category 1: saccade length ≤ 1° visual angle; category 2: saccade length > 1° and ≤2°; …; category 20: saccade length > 19°). Frequency analysis was done with respect to image category and actual presentation run. Hence, we got the absolute number of saccades of different length, which was finally transformed to percent values. For each image category, a 5 × 20 ANOVA (presentation run × saccade length) was computed for statistical testing, whereby the potential interaction was of primary interest. Independent of image type, we found a significant interaction [all F ≥ 5.138; all p < 0.001; η p 2 ≥ 0.105], whereby the pattern was nearly identical for all image types: Subjects used longer saccades to scan the image during the first presentation run than at later runs, and with increasing familiarity of images, the relative number of short saccades increased successively (Figure 6). 
Figure 6
 
Mean frequency of saccades on fractal images depending on saccade length. Saccades were assigned to one of 20 categories (category 1: saccade length ≤ 1° visual angle; category 2: saccade length > 1° and ≤2°; …; category 20: saccade length > 19°). Vertical lines indicate standard error of the mean.
Figure 6
 
Mean frequency of saccades on fractal images depending on saccade length. Saccades were assigned to one of 20 categories (category 1: saccade length ≤ 1° visual angle; category 2: saccade length > 1° and ≤2°; …; category 20: saccade length > 19°). Vertical lines indicate standard error of the mean.
Taken together, the congruency of FDMs significantly decreased across presentation runs, indicating that subjects did not always observe the same image regions, but they successively relocated their attention focus across repeated exposure to complex scenes. We additionally found a decrease of fixated image regions and fewer fixations on images across repeated presentations, as well as an image-type-dependent, left–right bias of fixations that was partially affected by image familiarity. Moreover, subjects used more saccades of short length at later presentations to scan the images. This result pattern indicates that the observer's attention focus becomes more local and centered with increasing image familiarity. Given these changes, the question arose to what extent basic image features remain influential on eye guidance across repeated presentations. 
Analysis of feature–fixation correlations (Hypotheses 2 and 3)
Correlations between image features and fixation likelihood were characterized by means of AUC values assessing the extent to which a certain feature discriminates between actual and control fixations. Overall, 22 image features served as dependent variables. In order to test Hypotheses 2 and 3, we used an omnibus 5 × 4 × 2 × 2 analysis design. The first factor was “presentation run,” with five levels introduced to test whether feature–fixation correlations change across multiple image observations. The second within-subject factor was the image category introduced to test whether results can be generalized to different image types. The third between-subject factor was “saccade group,” and it allowed testing whether a potential dependency between saccade length and basic image features' impact on viewing behavior is a between-subject phenomenon. We divided the sample into subjects being above the median saccade length (long saccade group; n = 22) and those being below the median saccade length (short saccade group; n = 23). Finally, we built the within-subject factor “saccade length” into the analysis design. This factor refers to the question of whether values of basic image features at fixated locations are higher following shorter saccades than those following longer saccades and if this issue remains across repeated presentations. 
Before we applied multivariate and univariate parametrical tests, we tested important statistical assumptions. First, we checked whether AUC values were normally distributed in all cells of the 5 × 4 × 2 × 2 design. Kolmogorov–Smirnov tests revealed that normally distributed data were present in all cells (all p > 0.22). Homogeneity of variance–covariance matrices was not testable since cells became singular when sample size did not surpass the number of image features. However, due to equal sample size in all cells, robustness of significance tests can be expected according to a Monte Carlo test of robustness for T 2 (Hakstian, Roed, & Lind, 1979). The assumption of variance homogeneity in all cells was tested by means of Levene's test. Only 2.5% of all cells did not fulfill this requirement, and consequently, analyses of variances were appropriate, especially since it is robust to variance heterogeneity when sample sizes are equal (Glass & Stanley, 1970). 
Multivariate effects
The 5 × 4 × 2 × 2 MANOVA (presentation run × image category × saccade group × saccade length before actual fixation) revealed no effect of the actual presentation run on feature–fixation correlations [p = 0.459]. However, significant differences were found between image types [Pillai's Trace = 2.814; F(66, 330) = 75.587; p < 0.001; η p 2 = 0.938]. Low-level image features at fixated locations also depended on the length of the preceding saccade [Pillai's Trace = 0.962; F(22, 22) = 25.524; p < 0.001; η p 2 = 0.962], whereby this saccade length effect interacted with the image type [Pillai's Trace = 2.245; F(66, 330) = 14.873; p < 0.001; η p 2 = 0.748]. All effects had a very large effect size. In contrast, the effect size of the following three significant interactions was small: The main effect of image type on feature–fixation correlations was moderated by the actual presentation run [Pillai's Trace = 0.603; F(264, 6072) = 1.218; p < 0.05; η p 2 = 0.050] and this interaction was additionally influenced by the saccade length preceding fixations [Pillai's Trace = 0.602; F(264, 6072) = 1.214; p < 0.05; η p 2 = 0.050]. Finally, this three-way interaction multivariately interacted with the between-subject factor “saccade group” [Pillai's Trace = 0.623; F(264, 6072) = 1.259; p < 0.01; η p 2 = 0.052], but no main effect of the factor “saccade group” was revealed [p = 0.454]. All other potential interactions did not reach significant level [all p > 0.129]. 
Hypothesis 2: Effects of repeated presentation and image type
In the next step, we analyzed each basic image feature (that is, each dependent variable) separately by calculating the corresponding 5 × 4 × 2 × 2 univariate ANOVAs for repeated measures (Greenhouse–Geisser applied) to scrutinize the direction of main effects and the structure of interactions. Due to multiple testing, the alpha level was adjusted to α adj = 0.002 (the exact value without rounding was used). 
Since we did not find a multivariate main effect of factor “presentation run” [p = 0.459], no further univariate testing was necessary. This result indicates that the correlation between low-level image features and fixation selection did not change over time, as predicted by Hypothesis 2. Hence, viewing behavior was not successively uncoupled from basic image features across repeated presentations (cf. Figure 7). 
Figure 7
 
Area under ROC region indicating the correlation between image feature and fixation likelihood separated for presentation runs. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation selection (dashed line).
Figure 7
 
Area under ROC region indicating the correlation between image feature and fixation likelihood separated for presentation runs. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation selection (dashed line).
The significant multivariate main effect of image type was analyzed on the single feature level. ANOVAs revealed a significant difference in AUC values between image categories for 21 of 22 basic image features [all p < 0.001; all η p 2 > 0.310 (ranging from 0.310 to 0.879)]. Figure 8 shows that urban and fractal images elicited the highest AUC values with respect to most features. In contrast, pink noise images were linked to low AUC values. In some cases, these are even below 0.5, indicating that fixations avoid regions with high values of the respective feature. Nature images' feature–fixation correlations were in between and with few exceptions above 0.5. To compare all four image categories against each other (overall, 132 post hoc), Bonferroni-adjusted t-tests (22 features × 6 pairwise comparisons) were calculated, showing 113 significant differences in image type's feature–fixation correlation [all p < 0.05]. Thus, AUC values differed significantly among almost all image categories regarding all image features. Non-significant comparisons between image categories were obtained across several features and did not show a consistent pattern. Altogether, these results clearly support an image type dependency of feature–fixation correlations as predicted by Hypothesis 2. 
Figure 8
 
Area under ROC values indicating the correlation between image feature and fixation likelihood separated for image categories. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 8
 
Area under ROC values indicating the correlation between image feature and fixation likelihood separated for image categories. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Univariate tests of the multivariate interaction between the actual presentation run and image category did not reach the adjusted significance level [all p > 0.035; mean p = 0.305; all η p 2 < 0.042]. On close examination, at least a slight decrease in AUC values over time regarding most image features was observable for urban but not for other image types. Hence, it can be generalized across image categories that scanning pathways were not successively uncoupled from basic image features across repeated presentations. 
Hypothesis 3: Influence of saccade length
Next, the multivariate main effect of the preceding saccade on feature values at fixation locations was revealed for 17 image features [all p < 0.001; all η p 2 > 0.317 (ranging from 0.317 to 0.785)] (Figure 9). Importantly, in all cases of significance, basic image features at fixated locations were higher following shorter saccades than those following longer saccades consistent with Hypothesis 3. 
Figure 9
 
The correlation between AUC values at fixated locations and the length of the preceding saccade. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Significant differences are marked by asterisks [***p < 0.002 (adjusted alpha level); **p < 0.01; *p < 0.05]. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 9
 
The correlation between AUC values at fixated locations and the length of the preceding saccade. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Significant differences are marked by asterisks [***p < 0.002 (adjusted alpha level); **p < 0.01; *p < 0.05]. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Moreover, this effect remains constant over time since no multivariate or univariate interaction with the actual presentation run was found. However, the saccade length effect on feature–fixation correlations was also significantly moderated by image category regarding all 22 image features, as revealed by univariate testing [all p < 0.001; all η p 2 > 0.146 (range from 0.146 to 0.614)]. According to the contrast maps for luminance (lumc 1°, 2°, and 3°), red–green (rgc 1° 2°, and 3°), and blue–yellow (byc 1°, 2°, and 3°), as well as for edge values (sob 0.5 and 1), the structure of interaction was rather consistent: Basic image features at fixated locations were higher following shorter saccades than those following longer saccades on urban, nature, and fractal scenes, but such a difference did not exist on pink noise images. Figure 10 depicts AUC values for luminance contrast at 1° visual angle to exemplify this interaction. The structure of interaction is more diverse for the remaining image features, but for all features, no effect of the preceding saccade length was found on pink noise images. 
Figure 10
 
The inter-relation between AUC values at fixated locations and the length of the preceding saccade with respect to image category. The figure exemplarily depicts the interaction between saccade length and image category for luminance contrast at high spatial resolution (lumc 1°). The interaction pattern shown is identical for luminance, red–green, and blue–yellow contrasts, as well as for edge values at all spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 10
 
The inter-relation between AUC values at fixated locations and the length of the preceding saccade with respect to image category. The figure exemplarily depicts the interaction between saccade length and image category for luminance contrast at high spatial resolution (lumc 1°). The interaction pattern shown is identical for luminance, red–green, and blue–yellow contrasts, as well as for edge values at all spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
To sum up, values of basic image features at fixated locations were higher following shorter saccades than those following longer saccades regarding most features, but image type and the spatial resolution of feature detectors partially influenced this effect. Importantly, we did not find any effect of the between-subject factor “saccade group,” indicating that the feature–fixation correlations did not depend on the mean visual step size used by subjects to scan the images. Hence, the link between short versus long saccades on feature–fixation correlations seems to be an exclusive within-subject phenomenon. 
All other interactions that were significant on the multivariate level were not paralleled with significant replications on the univariate level. The interaction between factors “presentation run,” “image category,” and “saccade length” was far away from the adjusted significance level with respect to all image features [all p > 0.020; mean p = 0.446; η p 2 < 0.050] as well as its additional interplay with factor “saccade group” [all p > 0.030; mean p = 0.347; η p 2 < 0.046]. 
To address a potential effect of lower targeting precision of long saccades on the observed dependence of feature–fixation correlations on saccade length, we present a more detailed analysis. We first exemplarily selected two features that discriminated well between short and long saccades in the original analysis (lumc 1° and rgc 1°). Then, the feature value at the actual fixated locations was replaced by the maximal feature value found in a circular area with a radius of 1° visual angle around the actual fixation point. If lower feature values at fixated locations after long saccades derived from a poorer visual targeting accuracy, potential higher feature values at the intended target position in the local neighborhood should be detectable by this max operation. The subsequent AUC analysis was performed as before. The results showed that the effect of saccade length is not a signature of visual targeting accuracy (see Figure 11). With respect to image feature “lumc 1°,” a t-test for paired samples revealed the identical difference between short and long saccades for both feature values at the fixation location [t(44) = 11.621; p < 0.001] and the maximal values around the fixation location [t(44) = 11.399; p < 0.001]. Regarding feature “rgc1°,” the saccade length effect was again identical for feature values at the fixation location [t(44) = 11.910; p < 0.001] and the maximal value around the fixation location [t(44) = 12.638; p < 0.001]. Hence, the difference in feature–fixation correlations after short and long saccades is not a side effect of low targeting accuracy. 
Figure 11
 
The correlation between AUC values at fixated locations and the length of the preceding saccade based on the original feature values at the exact fixation location, on the one hand, and the maximal feature value found in a circular area with a radius of 1° visual angle around the fixation location, on the other hand. On the left side, the feature–fixation correlations for luminance contrast at 1° visual angle are depicted, while on the right side AUC values for red–green contrast at 1° are shown. Vertical lines indicate standard error of the mean. Results of pairwise comparisons are marked.
Figure 11
 
The correlation between AUC values at fixated locations and the length of the preceding saccade based on the original feature values at the exact fixation location, on the one hand, and the maximal feature value found in a circular area with a radius of 1° visual angle around the fixation location, on the other hand. On the left side, the feature–fixation correlations for luminance contrast at 1° visual angle are depicted, while on the right side AUC values for red–green contrast at 1° are shown. Vertical lines indicate standard error of the mean. Results of pairwise comparisons are marked.
Discussion of Study 1
The present results support Hypothesis 1 as subjects observed different image locations across repeated presentations expressed by a significant decrease of fixation density map congruency. Upon repeated presentations, subjects neither performed independent eye movements nor scrutinized complementary image regions but tended to view partially overlapping image regions. Moreover, this was paralleled by a decrease of fixated image regions and fewer fixations on images across repeated presentations, as well as an image-type-dependent, left–right bias that was differentially influenced by the actual presentation run. Subjects also used more long saccades to scan the images during the first presentation than at later presentations. This result is in line with participants' verbal reports stating that they applied a strategy to scrutinize certain image regions of interest when observing identical images repeatedly (cf. Kaspar & König, 2011). All in all, the present results indicate that subjects substantially changed their viewing behavior across multiple image observations while their attention focus became more centered with increasing image familiarity. 
Importantly, although images become successively more familiar to subjects, the correlations between low-level image features and fixated locations did not change. This surprising result was unexpected (Hypothesis 2) and should be of high significance for further research on human overt attention. At this point, however, it should again be mentioned that correlations between low-level image features and fixation selection should not be taken to imply a direct causal link between such features and fixation locations, as has been highlighted by several authors (e.g., Foulsham & Underwood, 2008; Henderson, 2003; Tatler, Hayhoe, Land, & Ballard, 2011). Overall, the correlations between fixation locations and basic image features are similar to those found in previous studies (e.g., Betz et al., 2010; Einhäuser, Spain et al., 2008; Parkhurst et al., 2002). Feature–fixation correlations significantly differed between image categories (Hypothesis 3) and replicated previous findings (Acik et al., 2010, 2009). In addition to that, the observation of Tatler et al. (2006) that values of basic image features at fixated locations are higher following shorter saccades than those following longer saccades remained across repeated presentations. This difference, however, depended on image category since it was absent on pink noise images, which are free of semantic content. This result indicates a stable correlation between basic image features and fixation behavior: As long as basic image features attract attention, saccades remain short, and thus, humans scan visual scenes in small steps. Longer saccades to the next fixation location correspond with lower AUC values; this is less correlated with basic image features when visual step size is large. Importantly, this saccade length effect cannot be explained by poor targeting (in)accuracy. The saccade length effect was not affected when feature values at fixated locations were replaced by the maximal feature value in the neighborhood. Finally, this dependency between saccade length and feature values at fixated location seems to be a within- but not a between-subject phenomenon, since we found no correlation between subjects' mean saccade length and feature values at fixation locations. 
Study 2
On closer examination, a decrease of image features' impact on fixation behavior across repeated presentations was found on urban scenes by trend in Study 1. Therefore, we cannot completely exclude the possibility that scanning behavior can be successively uncoupled from low-level image features in free-viewing tasks when observing other images. The present results indicated that urban scenes appear predestined to evoke such an effect due to their high ecological validity and high familiarity in contrast to artificially generated fractals and pink noise images as well as nature scenes without any man-made objects. Perhaps fixation selection will become less correlated with basic image features if scenes are observed with which we are frequently confronted. This habituation to urban scenes could make it easier to remove fixation behavior from basic image features, as suggested by Study 1. Urban scenes consequently have to be further investigated in order to allow generalization of the present results or not. Moreover, Parkhurst and Niebur (2003) have shown that greater effects of local image contrasts were primarily found at scenes depicting buildings compared to scenes of landscapes, fractals, or home interiors. Consequently, image complexity could be a significant parameter in this context. Study 2 was conducted to clarify this issue. 
Methods
Participants
Thirty-five university students (7 males) who were naive to the purpose of the study participated. None of them participated in Study 1. One subject was excluded from data analysis due to problems in data recording. The remaining sample had a mean age of 25.9 years (19–49). All participants had normal or corrected-to-normal visual acuity. Before the eye-tracking session, all participants first had to pass the Ishihara Test for Color Blindness (Ishihara, 2005). They all signed a written consent form to participate in the study. The experiment was conducted in compliance with the Declaration of Helsinki as well as national and institutional guidelines for experiments with human subjects. 
Stimuli
Thirty color images depicting urban scenes served as stimuli (Figures 12A12C). They were taken with a high-resolution camera (Panasonic Lumix DMC TZ-5) at several places in Europe and showed either (a) global scenes containing many houses, streets, etc. (=high complex) or (b) local arrangements like single houses (=mid complex), or (c) close-ups of urban details like a park bench or a staircase (=low complex). Hence, images were split into three groups of different image complexity, each containing 10 images. Complexity measurement had been done before the study by four independent raters. They had to judge image complexity on a 3-point scale (1 = low, 2 = mid, 3 = high). Interrater agreement was perfect at 100%. Hence, complexity was defined by subjective impression but not by an objective measurement, which, perhaps, is misleading. For example, mean file size of jpg-compressed images did not differ significantly between the complexity levels based on the raters' evaluation. Images had a resolution of 2560 × 1600 pixels and were not converted or scaled down to conserve image details, especially in high complex images. 
Figure 12
 
Examples of image categories: (A) low complex, (B) mid complex, and (C) high complex urban scenes.
Figure 12
 
Examples of image categories: (A) low complex, (B) mid complex, and (C) high complex urban scenes.
Apparatuses
Stimuli were presented on a 30-inch Apple Cinema HD Display (Apple, California, USA). Screen distance was about 80 cm, display resolution fit the image resolution of 2560 × 1600 pixels, and the refresh rate was 60 Hz. No headrest was used to facilitate normal viewing behavior. The room and the experimental setup including all machines, software, and calibration were identical to those used in Experiment 1. 
Eye-tracking procedure
The experimental procedure was similar to Experiment 1. The room was darkened during the eye-tracking session in which each participant saw consecutively five blocks of all 30 images that were presented in pseudorandomized order within a block. Presentation duration of images was 6 s, as before. Participants were identically introduced to “observe the images” to elicit free-viewing behavior. Due to the smaller set of images, the duration of the present study was shorter than the first study and no need for a break occurred. 
Data analysis
In line with Study 1, we analyzed the eye-tracking data regarding (1) the fixation distributions across repeated presentations and (2) the correlation between basic image features and fixation likelihood by AUC values. The applied techniques and analyzed basic image features were completely identical. 
Results
Fixation distribution analysis (Hypothesis 1)
In line with the analysis procedure applied in Study 1, we first analyzed whether subjects observed identical image regions across repeated image exposure or if they fixated very different image locations between presentation runs. Figure 13 depicts some examples of raw data from single subjects. Table 2 depicted mean z-transformed intercorrelations between the FDMs of two presentation runs. 
Figure 13
 
Examples of fixations and corresponding saccades of single subjects depending on image complexity and repeated presentation.
Figure 13
 
Examples of fixations and corresponding saccades of single subjects depending on image complexity and repeated presentation.
Table 2
 
Matrix of mean Fischer's z-transformed intercorrelations (averaged across subjects) between FDMs of all presentation runs. Fisher's z-transformed correlations are signatures of the congruency of two FDMs. Correlations between two consecutive presentations are marked in bold.
Table 2
 
Matrix of mean Fischer's z-transformed intercorrelations (averaged across subjects) between FDMs of all presentation runs. Fisher's z-transformed correlations are signatures of the congruency of two FDMs. Correlations between two consecutive presentations are marked in bold.
Presentation Low complex scenes Mid complex scenes High complex scenes
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
1 1.0 1.0 1.0
2 0.26 1.0 0.32 1.0 0.26 1.0
3 0.23 0.26 1.0 0.26 0.25 1.0 0.21 0.22 1.0
4 0.21 0.25 0.27 1.0 0.25 0.25 0.24 1.0 0.17 0.17 0.18 1.0
5 0.19 0.23 0.23 0.27 1.0 0.21 0.21 0.22 0.23 1.0 0.17 0.17 0.17 0.17 1.0
In accordance with the intercorrelations found in Study 1, the congruency of two FDMs was maximal when a given presentation run was correlated with the following run. In the next step, we again focused on intercorrelations between two successive presentation runs. The mean z-transformed correlation (averaged across subjects and images of the same category) between FDMs of two successive presentation runs were tested against zero. Due to three image categories and again five presentation runs, twelve one-sample t-tests were calculated with an adjusted alpha level α adj = 0.004. In accordance with the results from the first experiment, all mean correlations significantly differed from zero [all t(33) > 10.077; all p < 0.001], and consequently, fixated locations at later presentation runs were not independent of previously observed image regions. It should be noted that absolute z-transformed correlations between FDMs were lower than in Study 1 because FDMs were convolved with a Gaussian kernel of the same size (FWHM of 3° visual angle) to produce an FDM, whereas absolute viewing angle was broader than in Study 1. 
To test for the predicted decrease in congruency between FDMs across repeated presentations, ANOVAs for repeated measures (Greenhouse–Geisser applied) for each image category were computed (adjusted alpha level α adj = 0.013). We found a significant difference between FDM correlations for mid complex scenes [F(2.404, 79.340) = 9.968; p < 0.001; η p 2 = 0.232] as well as for high complex scenes [F(2.516, 83.036) = 10.310; p < 0.001; η p 2 = 0.238]. Under both complexity conditions, the congruency of FDMs between first and second presentations was maximal and decreased across presentation runs (see Figure 14). In contrast, congruency of FDMs remained constant for low complex urban scenes [F(2.686, 88.650) = 0.037; p = 0.985; η p 2 = 0.001]. We also replicated the result pattern by using FDMs produced by convolving fixation distribution maps with a Gaussian kernel of 1° and 2° visual angle FWHM. 
Figure 14
 
Congruency of FDMs. Fisher's z-transformed correlations between FDMs of two consecutive presentation runs were averaged across subjects. Mean z-transformed correlations differing significantly from zero are indicated by asterisks [***p < 0.003 (adjusted alpha level); *p < 0.05]. Vertical lines above bars indicate standard error of measurement. Red horizontal lines within bars indicate the reference value. This is the mean correlation between two FDMs that can be expected when both FDMs are based on fixations being randomly sampled from all fixations made on images of a certain category.
Figure 14
 
Congruency of FDMs. Fisher's z-transformed correlations between FDMs of two consecutive presentation runs were averaged across subjects. Mean z-transformed correlations differing significantly from zero are indicated by asterisks [***p < 0.003 (adjusted alpha level); *p < 0.05]. Vertical lines above bars indicate standard error of measurement. Red horizontal lines within bars indicate the reference value. This is the mean correlation between two FDMs that can be expected when both FDMs are based on fixations being randomly sampled from all fixations made on images of a certain category.
To conclude, the present results of fixation distribution analysis match well with the results of Study 1, because fixated locations at later presentation runs were not independent of previously observed image regions on urban scenes. Moreover, for mid and high complex scenes, we again found the congruency of FDMs to slightly decrease across presentation runs as predicted by Hypothesis 1. Consequently, subjects successively relocated their attention focus across repeated exposure to identical visual scenes as far as scene complexity provided sufficient details, which can be scrutinized at a later point in time. 
To deepen the analysis of the observer's visual exploration behavior across presentation runs, in line with Study 1, we analyzed (1) the number of observed image regions, (2) the time course of the potential left–right bias of fixations, and (3) the frequencies of saccade length across presentation runs. 
(1) To analyze the number of image regions observed across repeated presentation runs, a 3 × 5 (image category × presentation run) ANOVA for repeated measures was computed. Frequencies were normally distributed in all cells [Kolmogorov–Smirnov tests: all p > 0.468]. The analysis revealed a significant decrease of fixated regions across repeated presentation [F(2.572, 84.892) = 41.963; p < 0.001; η p 2 = 0.560] and a significant interaction with the image complexity [F(6.052, 199.721) = 2.679; p < 0.05; η p 2 = 0.075]. One-way ANOVAs separately computed for each image category showed a significant decrease of fixated regions independent of image complexity [all F ≥ 18.055; p < 0.001; all η p 2 ≥ 0.354] that was not limited to the step from first to second presentation run (see Figure 15). 
Figure 15
 
Number of fixated image regions for each image category. Images were divided into a 4 × 4 grid of equal-sized regions and the number of fixated regions was counted separately for each image and presentation run. The mean across images is depicted. Vertical lines indicate standard error of the mean. Significant differences between presentation runs (Bonferroni-adjusted t-tests) are marked.
Figure 15
 
Number of fixated image regions for each image category. Images were divided into a 4 × 4 grid of equal-sized regions and the number of fixated regions was counted separately for each image and presentation run. The mean across images is depicted. Vertical lines indicate standard error of the mean. Significant differences between presentation runs (Bonferroni-adjusted t-tests) are marked.
(2) In the next step, a 3 × 5 × 2 (image complexity × presentation run × image side) ANOVA for repeated measures was calculated to analyze the number of fixations made on images with respect to a potential left–right bias that was also found in Study 1. Data were normal distributed in all cells [Kolmogorov–Smirnov tests: all p > 0.191]. The ANOVA revealed a significant linear increase of fixations with increasing image complexity [F(1.324, 43.686) = 59.069; p < 0.001; η p 2 = 0.642], whereby significant differences between all image categories were present [Bonferroni-adjusted t-tests: all p < 0.001]. We also found a continuous decrease across presentations runs [F(1.702, 56.165) = 16.531; p < 0.001; η p 2 = 0.334; run 1 vs. 2/3/4/5: p < 0.01; 2 vs. 4/5: p < 0.01; 3 vs. 4/5: p < 0.01]. Moreover, a strong left bias was present [F(1, 33) = 46.602; p < 0.001; η p 2 = 0.585], which, on the one hand, interacted with the image complexity [F(1.460, 48.187) = 37.952; p < 0.001; η p 2 = 0.535]. The left-side bias decreased with increasing image complexity (Figure 16, right side). The left-side bias was also influenced by the repeated presentation on the other hand [F(2.738, 90.345) = 3.062; p < 0.05; η p 2 = 0.085], as depicted by Figure 16 (left side). The number of fixations made on the right side remained stable across repeated presentations as revealed by a one-way ANOVA [F(2.708, 89.357) = 1.409; p = 0.247; η p 2 = 0.041], but fixations on the left image side decreased across presentation runs [F(2.564, 84.602) = 14.282; p < 0.001; η p 2 = 0.302]. Bonferroni-adjusted t-tests for pairwise comparisons were calculated [run 1 vs. 2/3/4: all p < 0.05; 2 vs. 4/5: all p < 0.01; 3 vs. 5: p < 0.05]. 
Figure 16
 
Mean number of fixations on images' left and right sides in dependence of the actual presentation run (left side) and left bias of fixations depending on image complexity (right side). Vertical lines indicate standard error of the mean.
Figure 16
 
Mean number of fixations on images' left and right sides in dependence of the actual presentation run (left side) and left bias of fixations depending on image complexity (right side). Vertical lines indicate standard error of the mean.
(3) Finally, a frequency analysis of saccade lengths was computed to clarify whether urban images of different complexity were scanned with shorter visual steps at later presentations. In accordance with Study 1, saccades were assigned to one of 20 categories, each containing saccades of a specific length. For each image category, a 5 × 20 ANOVA (presentation run × saccade length) was computed, and all revealed a significant interaction [all F ≥ 7.043; all p < 0.001; η p 2 ≥ 0.176] with the identical pattern also found in Study 1: The relative number of short saccades increased across repeated presentations (i.e., with increasing familiarity of images). Figure 17 depicts this interaction exemplarily for mid complex urban images. 
Figure 17
 
Mean frequency of saccades on mid complex images depending on saccade length. Saccades were assigned to one of 20 categories (category 1: saccade length ≤ 1° visual angle; category 2: saccade length > 1° and ≤2°; …; category 20: saccade length > 19°). Vertical lines indicate standard error of the mean.
Figure 17
 
Mean frequency of saccades on mid complex images depending on saccade length. Saccades were assigned to one of 20 categories (category 1: saccade length ≤ 1° visual angle; category 2: saccade length > 1° and ≤2°; …; category 20: saccade length > 19°). Vertical lines indicate standard error of the mean.
All in all, in accordance with Study 1, the congruency of FDMs slightly but significantly decreased across presentation runs. In addition, less fixated image regions and fewer fixations on images across repeated presentations were found once more. The left-side bias on urban images found in Study 1 was replicated, but the bias was affected by image complexity. The bias diminished with increasing image complexity as well as across repeated presentations. Subjects also used relatively more saccades of short length at later presentations, as previously found for other image types in Study 1. This result pattern indicates that the observer's attention focus becomes more local with increasing image familiarity on urban images, independent of image complexity. 
Analysis of feature–fixation correlations (Hypotheses 2 and 3)
To analyze the correlations between image features and fixation likelihood by means of AUC values, we used the same omnibus design as in Study 1. This is a 5 × 3 × 2 × 2 design (presentation run × image complexity × saccade group × saccade length before actual fixation). In contrast to Study 1, in the present case, the factor “image category” consisted of three levels and saccade groups had a sample size of n = 17. 
First, statistical assumptions were tested. Kolmogorov–Smirnov tests revealed that normal distributed data were present in all cells (all p > 0.20). Levene's tests identified 3.2% of all cells that did not fulfill the requirement of variance homogeneity. Hence, analyses of variances were appropriate. 
Multivariate effects
In contrast to Study 1, the 5 × 3 × 2 × 2 MANOVA revealed a main effect of factor-repeated presentation on feature–fixation correlations [Pillai's Trace = 1.029; F(88, 440) = 1.731; p < 0.001; η p 2 = 0.257]. In agreement with Study 1, we also found differences between the urban images of different complexity [Pillai's Trace = 1.887; F(44, 88) = 33.248; p < 0.001; η p 2 = 0.943]. Moreover, the effect of short versus long saccades on feature–fixation correlations was replicated [Pillai's Trace = 0.950; F(22, 10) = 9.425; p < 0.001; η p 2 = 0.950] as well as a significant interaction with image type [Pillai's Trace = 1.697; F(44, 88) = 11.191; p < 0.001; η p 2 = 0.848], which was additionally moderated by factor “saccade group” [Pillai's Trace = 0.876; F(44, 88) = 1.559; p < 0.05; η p 2 = 0.438]. Furthermore, the effect of image type was moderated by the actual presentation run [Pillai's Trace = 0.917; F(176, 1936) = 1.423; p < 0.001; η p 2 = 0.115]. Finally, in contrast to Study 1, we found a slightly significant interaction between repeated image observations and the between-subject factor “saccade group” [Pillai's Trace = 0.807; F(88, 440) = 1.265; p = 0.068; η p 2 = 0.202]. All other possible effects were far away from significance level. 
Hypothesis 2: Effects of presentation run and image category
The multivariate main effect of the actual presentation run on feature–fixation correlations was subsequently analyzed by ANOVAs. However, no effect of repeated presentation on AUC values reached the adjusted alpha level of α adj = 0.002. Only with respect to ten of 22 image features was the effect slightly significant [all p < 0.05, all η p 2 > 0.079 (range from 0.079 to 0.116)]. The direction of the effect, however, was contrary to the predicted direction (Hypothesis 2) for 7 features (blue–yellow and luminance contrasts); this is the correlation between fixation probability and feature value increasing across multiple image observations [all p < 0.05]. Only three feature–fixation correlations (red–green contrasts) decreased over time, whereas this decrease only refers to high complex scenes as shown by significant interactions between the actual presentation run and image complexity [all p < 0.01]. Overall, results contradict the hypothesis that subjects' scanning pathways were successively uncoupled from basic image features and the influence of bottom-up guidance of fixation behavior remained constant or even increased across repeated exposure to stimuli. The significant multivariate main effects of image category was supported by significant ANOVAs for all image features [all p < 0.001; all η p 2 > 0.256 (ranging from 0.256 to 0.852)]. Figure 18 shows a schematic of this effect: Regarding blue–yellow contrast (byc 1°, 2°, and 3°), red–green contrast (rgc 1°, 2°, and 3°), luminance contrast (lumc 1°, 2°, and 3°) as well as edge values (sob 0.5 and 1), feature–fixation correlations were maximal for mid complex scenes and minimal for low-complex scenes. The effect was reversed for second-order texture contrasts, which need lower spatial resolution and saturation. Results clearly support the predicted impact of image type and show that the correlation between fixation probability and basic image features does not only differ between scene types of a different gist (Study 1) but also depends on the complexity of urban scenes. 
Figure 18
 
Area under ROC values indicating the correlation between image feature and fixation likelihood separated for urban scenes of different complexity. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 18
 
Area under ROC values indicating the correlation between image feature and fixation likelihood separated for urban scenes of different complexity. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Hypothesis 3: Influence of saccade length
According to univariate tests, the effect of saccade length on feature–fixation correlations was significant for 17 image features [all p < 0.001; all η p 2 > 0.429 (ranging from 0.429 to 0.847)] and four further features narrowly missed the adjusted alpha level [all p < 0.05; all η p 2 > 0.165 (range from 0.165 to 0.250)]. The saccade length effect is depicted in Figure 19. In all cases, basic image features at fixated locations were higher following shorter saccades than those following longer saccades consistent with the results reported above. 
Figure 19
 
The correlation between AUC values at fixated locations and the length of the preceding saccade. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Significant differences are marked by asterisks [***p < 0.002 (adjusted alpha level); **p < 0.01; *p < 0.05]. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 19
 
The correlation between AUC values at fixated locations and the length of the preceding saccade. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Significant differences are marked by asterisks [***p < 0.002 (adjusted alpha level); **p < 0.01; *p < 0.05]. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
In the context of this replicated saccade length effect, we again tested whether the effect is a signature of visual target (in)accuracy. For that purpose, the same reanalysis was done as in Study 1: We compared the saccade length effect when considering the feature value at the fixation location, on the one hand, and when replacing this feature value by the maximal value found in the neighborhood (circular area with a radius of 1° around fixation coordinates), on the other hand. Regarding feature “lumc 1°,” t-tests for paired samples revealed the identical saccade length effect for both feature values at the fixation location [t(33) = 9.857; p < 0.001] and the maximal value around the fixation location [t(33) = 8.892; p < 0.001] (see Figure 20, left side). Regarding feature “rgc 1°,” the effect was also the same for original feature values [t(33) = 13.465; p < 0.001] and for maximal feature values [t(33) = 12.793; p < 0.001] (Figure 20, right side). Consequently, and in accordance with Study 1, results clearly showed that the effect of saccade length cannot be explained in terms of visual targeting accuracy. 
Figure 20
 
The correlation between AUC values at fixated locations and the length of the preceding saccade based on the original feature values at the exact fixation location, on the one hand, and the maximal feature value found in a circular area with a radius of 1° visual angle around the fixation location, on the other hand. On the left side, the feature–fixation correlations for luminance contrast at 1° visual angle are depicted, and on the right side, the AUC values for red–green contrast at 1° are shown. Vertical lines indicate standard error of the mean. Results of pairwise comparisons are marked.
Figure 20
 
The correlation between AUC values at fixated locations and the length of the preceding saccade based on the original feature values at the exact fixation location, on the one hand, and the maximal feature value found in a circular area with a radius of 1° visual angle around the fixation location, on the other hand. On the left side, the feature–fixation correlations for luminance contrast at 1° visual angle are depicted, and on the right side, the AUC values for red–green contrast at 1° are shown. Vertical lines indicate standard error of the mean. Results of pairwise comparisons are marked.
The multivariate significant interaction between factors image complexity and saccade length of Study 1 was replicated, as well as its univariate counterparts regarding 19 features [all p < 0.001; all η p 2 > 0.214 (range from 0.214 to 0.470)]. Again, a clear structure of interactions was found (see Figure 21). First, for contrasts of red–green (rgc 1°, 2°, and 3°), blue–yellow (byc 1°, 2°, and 3°), and luminance (lumc 1°, 2°, and 3°), as well as for saturation (sat 5 and 25) and edge values (sob 0.5 and 1), the effect of factor saccade length became larger the more complex urban scenes were. Second, the effect of saccade length decreased with higher scene complexity with respect to second-order texture contrasts. Consequently, the effect that basic image features at fixated locations were higher following shorter saccades than those following longer saccades depends on spatial resolution of image features, on the one hand, and scene complexity, on the other hand, whereas both have to be reciprocally related to maximize the effect. 
Figure 21
 
The inter-relation between AUC values at fixated locations and the length of the preceding saccade with respect to image category. The left figure exemplarily depicts the interaction between saccade length and image complexity for red–green contrast at mid spatial resolution (rgc 2°). The interaction pattern shown is identical for luminance, red–green, and blue–yellow contrasts, edge values, and saturation at all spatial resolutions. The right figure exemplarily depicts the interaction that is similar for all second-order texture contrasts at all spatial resolutions (bytc, rgtc, and lumtc at 2°, 4°, and 6°). Vertical lines indicate standard error of measurement. The AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 21
 
The inter-relation between AUC values at fixated locations and the length of the preceding saccade with respect to image category. The left figure exemplarily depicts the interaction between saccade length and image complexity for red–green contrast at mid spatial resolution (rgc 2°). The interaction pattern shown is identical for luminance, red–green, and blue–yellow contrasts, edge values, and saturation at all spatial resolutions. The right figure exemplarily depicts the interaction that is similar for all second-order texture contrasts at all spatial resolutions (bytc, rgtc, and lumtc at 2°, 4°, and 6°). Vertical lines indicate standard error of measurement. The AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
In contrast to Study 1, we found a significant multivariate interaction between the actual presentation run and the between-subject factor “saccade group” and 12 univariate counterparts that narrowly missed the adjusted alpha level [all p < 0.05; all η p 2 > 0.073 (range from 0.073 to 0.132)]. The structure of interaction was always the same: Subjects who, on average, made longer saccades started with lower feature–fixation correlations at the first presentation run than subjects who made shorter saccades on average. At the fifth presentation, the direction of difference was reversed (Figure 22, left side). Importantly, this saccade group dependency of feature–fixation correlations was not paralleled by an interaction between subjects' mean saccade length and the actual presentation run with respect to saccade length [F(3.232, 103.416) = 0.459; p = 0.726; all η p 2 = 0.014], as depicted in Figure 22 (right side). All other possible effects were non-significant either at multivariate level or at univariate level. 
Figure 22
 
AUC values' dependency on actual presentation run and between-subject saccade length (left side) illustrated by luminance texture contrast at high spatial resolution (lumtc 1°). Subjects who showed an average saccade length above the median (saccade group = long) are opposed to subjects whose mean saccade length was below the median (saccade group = short). The shown reversion of the between-group differences in AUC values across repeated presentations is representative for all image features. Right side: Mean saccade length of saccade groups separated for presentation runs.
Figure 22
 
AUC values' dependency on actual presentation run and between-subject saccade length (left side) illustrated by luminance texture contrast at high spatial resolution (lumtc 1°). Subjects who showed an average saccade length above the median (saccade group = long) are opposed to subjects whose mean saccade length was below the median (saccade group = short). The shown reversion of the between-group differences in AUC values across repeated presentations is representative for all image features. Right side: Mean saccade length of saccade groups separated for presentation runs.
Discussion of Study 2
The results of Study 2 confirm the results from Study 1 with one noteworthy exception. First, fixated locations at later presentation runs depended on previously observed image regions in urban scenes of different complexity. For mid and high complex scenes, we additionally found the congruency of FDMs to decrease across presentation runs as predicted. Thus, subjects observed partially new but also overlapping image regions across repeated presentations. Again, this was paralleled by a decrease of fixated image regions, fewer fixations on images, and a significant left bias of fixations that decreased across repeated presentations. The frequency of short saccades used to scan the images successively increased with increasing image familiarity. Consequently, this result pattern is in accordance with the results found on urban images in Study 1. It indicates a shift in viewing behavior across multiple image observations and a more centered attention focus at later presentations. 
Second, this change in fixated image regions once more was not paralleled by a decrease of feature–fixation correlation by means of AUC values contradicting Hypothesis 2. Even a slight increase of correlation across repeated exposure to stimuli was found regarding certain image features. In agreement with Hypothesis 2, the correlation between fixation probability and basic image does not only differ between semantically different scene types as realized in Study 1. The size of AUC values also differed significantly between urban scenes of different complexity. 
Independent of repeated presentations, image features at fixated locations were higher following shorter saccades than those following longer saccades. However, this general effect was moderated by scene complexity, whereas the structure of interaction depended on the actual image feature. For first-order contrasts, saturation, and edge values, the effect of factor saccade length became larger the more complex the urban scenes were, but regarding second-order texture contrasts, it decreased with higher scene complexity. This result clearly shows that the effects of basic image features depend on two related parameters, namely, scene complexity and the spatial resolution of investigated image features. 
Finally, although we again found no main effect of subjects' mean saccade length on feature–fixation correlation, the interplay of saccade length and feature value at fixated location appears to be not an exclusive within-subject phenomenon as suggested by Study 1. A significant interaction of subjects' mean saccade length and presentation run (not adjusted alpha level) showed that subjects who, on average, made longer saccades tended to start with lower feature–fixation correlations at first presentation run than did subjects who made shorter saccades, on average. At the fifth presentation, the direction of difference was reversed, however. This difference between saccade groups did not derive from subjects' saccade lengths. Both groups showed the same decrease in saccade length across repeated scene observations. Consequently, by trend, the results suggest that humans making longer saccades in general showed an incremental link between low-level image features and fixation selection across multiple image observations. In contrast, subjects showing a generally shorter visual step size showed a decreased feature–fixation correlation at later presentation runs. These interindividual differences were not found in Study 1 and suggested a scene-type dependency. Against this background, the results of Study 1 have to be somewhat reconsidered. With the increased power of the second study, we demonstrate that the correlation of saccade length and feature value at fixated location is not an exclusive within-subject phenomenon. Further research should deepen the understanding of interindividual differences in viewing behavior. 
General discussion
The present study was conducted to deepen the actual knowledge about the impact of multiple exposures to identical stimuli on human overt attention on complex visual scenes. The present results showed that subjects observed different but also partially overlapping image regions across repeated presentations and hence support corresponding verbal reports of subjects (cf. Kaspar & König, 2011). For mid and high complex urban scenes as well as for nature and fractal scenes, we found the congruency of FDMs to slightly decrease across presentation runs, whereby the congruency of fixation distributions of two presentation runs was maximal when comparing the actual presentation run with the following one. Interestingly, in a recent study on web pages (Kaspar, Ollermann, & Hamborg, submitted for publication), this effect was not found, but an increase of fixation distribution congruency across structural identical web pages was found with respect to the ten initial fixations on web pages. In the present study, upon repeated presentations, subjects neither performed independent scanning eye movements nor scrutinized complementary image regions but tended to view largely overlapping image regions. This significant congruency of fixated regions was also paralleled by a decrease of fixated image regions and fewer fixations on images across repeated presentations, as well as an image-type-dependent, left–right side bias that decreased across repeated presentations only on urban scenes in Study 2. Subjects also used more saccades of short length at later presentations, whereas images were scanned with larger visual steps during the first viewing. All in all, the present results indicate that the observers change their viewing behavior across repeated presentations while their attention focus becomes more centered with increasing image familiarity. However, perhaps the magnitude of the present effects is specific for free-viewing tasks and is weaker than they would be if the instruction was to memorize scene content or to recognize scenes. Further research should address this issue, but the results of Betz et al. (2010) suggest no qualitative change of the present results. In the study of Betz et al., the course of overt attention emerged as task-dependent, but the pattern of feature–fixation correlations remained constant across different tasks. The authors concluded that task-dependent differences in fixation behavior are not mediated by a reweighting of features in the bottom-up hierarchy. 
Furthermore, it cannot completely exclude the possibility that some of the present effects are partially a signature of fatigue and not only of repeated presentations of images. However, to minimize this possibility, subjects had a break after the third block in Study 1 but no break in Study 2 due to the smaller set of images. As shown in Kaspar and König (2011), changes in viewing behavior across repeated image observations can also be affected by interindividual motivation. 
However, the present results clearly contradict the hypothesis that these changes in eye movements and attended image regions coincide with a smaller correlation between basic image features and fixation behavior. That is, scanning pathways were not uncoupled from basic image features with increasing image familiarity across repeated presentations. Although subjects retrospectively reported that they applied a strategy to focus certain image regions of interest during later presentations and to switch from a stimulus-driven exploratory behavior to an internal guidance of eye movements over time, feature–fixation correlations remained surprisingly constant. Some researchers pointed out that the area under the receiver–operator curve in the region of 0.55 to 0.65 suggests only a modest relationship between low-level image features and fixation locations (e.g., Einhäuser, Spain et al., 2008; Nyström & Holmqvist, 2008; Tatler et al., 2005). However, independent of the discussion about the magnitude of such correlation, no significant changes were found across repeated presentations in the present study. This finding is comparable with the results of Betz et al. (2010) showing that task-dependent differences in fixation behavior are obviously not mediated by a reweighting of features in the bottom-up hierarchy. In the present study, increasing familiarity of images did not lead to a reweighting of features. Moreover, the present results match with the findings of Foulsham and Underwood (2008), showing that saliency at fixated locations only decreased slightly over multiple fixations but that familiarity of images did not influence this effect. In addition to that, Tatler et al. (2005) did not find a time-dependent change in feature–fixation correlations during individual presentations. The present results are in contrast to those of Parkhurst et al. (2002), who found changes in feature values across multiple fixations but also used a different analysis technique (for a discussion, see Tatler et al., 2005). 
Overall, the stable correlation between low-level features and fixation selection found in the present study has important implications for research on human overt attention: As far as studies focus on the effect of low-level stimulus properties on attentional processes, potential differences in the image familiarity are negligible. Results of previous studies on feature–fixation correlations (Acik et al., 2010, 2009; Baddeley & Tatler, 2006; Betz et al., 2010; Einhäuser & König, 2003; Einhäuser, Rutishauser et al., 2008; Einhäuser, Spain et al., 2008; Frey et al., 2007; Frey & König, 2008; Mannan et al., 1996; Parkhurst & Niebur, 2003; Reinagel & Zador, 1999; Tatler et al., 2006) are valid against this background, although familiarity of images perhaps was partly uncontrolled. Nonetheless, the stability of feature–fixation correlations over time should be considered in all further studies on memory effects on overt attention. It also could be fruitful for mathematical models describing and predicting human overt attention since low-level image properties play a central role in such models (e.g., Itti & Koch, 2000; Parkhurst et al., 2002; Torralba, Oliva, Castelhano, & Henderson, 2006). 
The well-known effect of image type on feature–fixation correlations (Acik et al., 2010, 2009; Betz et al., 2010; Frey & König, 2008; Krieger et al., 2000) is much bigger than the effect of memory-mediated scene familiarity. Moreover, the present results suggested that differences between image types appear to be an effect of the image complexity. Further research has to resolve the question of whether these differences in feature–fixation correlations are derived from different image complexity alone or if the varying gist or semantics of images can also explain variance, at least partially. Furthermore, in Study 2, the complexity of images was operationalized by raters' subjective evaluation that could be accompanied by a zoom-in/out in some (not all) images. 
However, the stable (but relatively small) impact of basic image features on fixation behavior across multiple exposures to identical images is not necessarily paralleled by a constant viewing behavior in general. The present results as well as recent studies provided clear evidence that changes in viewing behavior over time are traceable but motivation dependent on complex scenes (Kaspar & König, 2011) as well as on web pages (Kaspar et al., submitted for publication). 
Furthermore, the correlation between basic image features and fixation behavior is moderated by actual saccade length, whereas this phenomenon remains constant across multiple image observations. Independent of the repeated presentation of images, basic image features at fixated locations were higher following shorter saccades than those following longer saccades as already shown by Acik et al. (2010) and Tatler et al. (2006). This clearly argues for an effect in fixation selection that is not weakened by higher familiarity of visual stimuli. Importantly, this effect cannot be explained by low target accuracy as shown. 
In addition to that, the moderating effect of scene complexity on this saccade length effect suggests that effects of basic image features depend on two related parameters, namely, scene complexity and the spatial resolution of investigated image features. Consequently, further research has to take into account scene complexity when selecting appropriate spatial resolutions of investigated image features. 
Finally, the interplay of saccade length and feature–fixation correlations was also found on the interindividual level. Depending on actual presentation run, the strength of the link between eye movements and basic image features was different for subjects who differ in their mean saccade length. Hence, we demonstrated that the correlation of saccade length and feature value at fixated location is not an exclusive within-subject phenomenon. We advise considering interindividual differences in viewing behavior in future research on human overt attention. 
Acknowledgments
We gratefully acknowledge the support by proposals ERC-2010-AdG - Advanced Investigator Grant - Proposal n°269716 MULTISENSE (PK). 
We would like to thank Sabine König for help in data acquisition. 
Commercial relationships: none. 
Corresponding author: Kai Kaspar. 
Email: kkaspar@uos.de. 
Address: Institute of Cognitive Science and Institute of Psychology, University of Osnabrück, Albrechtstr. 28, 49069 Osnabrück, Germany. 
References
Acik A. Sarwary A. Schultze-Kraft R. Onat S. König P. (2010). Developmental changes in natural viewing behavior: Bottom-up and top-down differences between children, young adults and older adults. Frontiers in Perception Science, 1, 207.
Acik A. Selim O. Schumann F. Einhäuser W. König P. (2009). Effects of luminance contrast and its modifications on fixation behaviour during free viewing of images from different categories. Vision Research, 49, 1541–1553. [PubMed] [CrossRef] [PubMed]
Althoff R. R. Cohen N. J. (1999). Eye-movement-based memory effect: A reprocessing effect in face perception. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 997–1010. [PubMed] [CrossRef] [PubMed]
Baddeley R. J. Tatler B. W. (2006). High frequency edges (but not contrast) predict where we fixate: A Bayesian system identification analysis. Vision Research, 46, 2824–2833. [PubMed] [CrossRef] [PubMed]
Betz T. Kietzmann T. C. Wilming N. König P. (2010). Investigating task-dependent top-down effects on overt visual attention. Journal of Vision, 10(3):15, 1–14, http://www.journalofvision.org/content/10/3/15, doi:10.1167/10.3.15. [PubMed] [Article] [CrossRef] [PubMed]
Brockmole J. R. Henderson J. M. (2006). Recognition and attention guidance during contextual cueing in real-world scenes: Evidence from eye movements. Quarterly Journal of Experimental Psychology, 59, 1177–1187. [PubMed] [CrossRef]
Chun M. M. Jiang Y. (1998). Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36, 28–71. [PubMed] [CrossRef] [PubMed]
Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: L. Erlbaum.
Derrington A. M. Krauskopf J. Lennie P. (1984). Chromatic mechanisms in lateral geniculate nucleus of macaque. The Journal of Physiology, 357, 241–265. [PubMed] [CrossRef] [PubMed]
Einhäuser W. König P. (2003). Does luminance-contrast contribute to a saliency map of overt visual attention? European Journal of Neuroscience, 17, 1089–1097. [PubMed] [CrossRef] [PubMed]
Einhäuser W. König P. (2010). Getting real—Sensory processing of natural stimuli. Current Opinion in Neurobiology, 20, 389–395. [PubMed] [CrossRef] [PubMed]
Einhäuser W. Rutishauser U. Frady E. P. Nadler S. König P. Koch C. (2006). The relation of phase noise and luminance contrast to overt attention in complex visual stimuli. Journal of Vision, 6(11):1, 1148–1158, http://www.journalofvision.org/content/6/11/1, doi:10.1167/6.11.1. [PubMed] [Article] [CrossRef] [PubMed]
Einhäuser W. Rutishauser U. Koch C. (2008). Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. Journal of Vision, 8(2):2, 1–19, http://www.journalofvision.org/content/8/2/2, doi:10.1167/8.2.2. [PubMed] [Article] [CrossRef] [PubMed]
Einhäuser W. Schuhmann F. Bardins S. Bartl K. Böning G. Schneider E. et al. (2007). Human eye–head co-ordination in natural exploration. Network: Computation in Neural Systems, 18, 267–297. [PubMed] [CrossRef]
Einhäuser W. Spain M. Perona P. (2008). Objects predict fixations better than early saliency. Journal of Vision, 8(14):18, 1–26, http://www.journalofvision.org/content/8/14/18, doi:10.1167/8.14.18. [PubMed] [Article] [CrossRef] [PubMed]
Foulsham T. Underwood G. (2008). What can saliency models predict about eye movements? Spatial and sequential aspects of fixations during encoding and recognition. Journal of Vision, 8(2):6, 1–17, http://www.journalofvision.org/content/8/2/6, doi:10.1167/8.2.6. [PubMed] [Article] [CrossRef] [PubMed]
Frey H. König P. Einhauser W. (2007). The role of first- and second-order stimulus features for human overt attention. Perception & Psychophysics, 69, 153–161. [PubMed] [CrossRef] [PubMed]
Frey H.-P. König P. (2008). What's color got to do with it? The influence of color on visual attention in different categories. Journal of Vision, 8(14):6, 1–17, http://www.journalofvision.org/content/8/14/6, doi:10.1167/8.14.6. [PubMed] [Article] [CrossRef] [PubMed]
Glass G. V. Stanley J. C. (1970). Statistical methods in education and psychology. Englewood Cliffs, NJ: Prentice-Hall.
Hakstian A. R. Roed J. C. Lind J. C. (1979). Two-sample T-2 procedure and the assumption of homogeneous covariance matrices. Psychological Bulletin, 86, 1255–1263. [CrossRef]
Harding G. Bloj M. (2010). Real and predicted influence of image manipulations on eye movements during scene recognition. Journal of Vision, 10(2):8, 1–17, http://www.journalofvision.org/content/10/2/8, doi:10.1167/10.2.8. [PubMed] [Article] [CrossRef] [PubMed]
Henderson J. M. (2003). Human gaze control during real-world scene perception. Trends in Cognitive Science, 7, 498–504. [PubMed] [CrossRef]
Henderson J. M. Brockmole J. R. Castelhano M. S. Mack M. (2007). Visual saliency does not account for eye movements during search in real-world scenes. In van Gompel R. Fischer M. Murray W. Hill R. (Eds.), Eye movements: A window on mind and brain (pp. 537–562). Oxford, UK: Elsevier.
Hoffman J. E. Subramaniam B. (1995). The role of visual attention in saccadic eye movements. Perception & Psychophysics, 57, 787–795. [PubMed] [CrossRef] [PubMed]
Hollingworth A. Henderson J. M. (2000). Semantic informativeness mediates the detection of changes in natural scenes. Visual Cognition, 7, 213–235. [CrossRef]
Hunter J. E. Schmidt F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings. Thousand Oaks, CA: Sage Publications.
Ishihara S. (2005). Ishihara's tests for colour deficiency. Tokyo: Kanehara Trading.
Itti L. Koch C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40, 1489–1506. [PubMed] [CrossRef] [PubMed]
Jovancevic J. Sullivan B. Hayhoe M. (2006). Control of attention and gaze in complex environments. Journal of Vision, 6(12):9, 1431–1450, http://www.journalofvision.org/content/6/12/9, doi:10.1167/6.12.9. [PubMed] [Article] [CrossRef]
Kaspar K. König P. (2011). Overt attention and context factors: The impact of repeated presentations, image type, and individual motivation. PLoS ONE 6, e21719. [Article]
Kaspar K. Ollermann F. Hamborg K.-C. (submitted for publication). Time-dependent changes in viewing behavior on similarly structured web pages.
Kayser C. Nielsen K. J. Logothetis N. K. (2006). Fixations in natural scenes: Interaction of image structure and image content. Vision Research, 46, 2535–2545. [PubMed] [CrossRef] [PubMed]
Koch C. Ullman S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4, 219–227. [PubMed] [PubMed]
Kollmorgen S. Nortmann N. Schröder S. König P. (2010). Influence of low-level stimulus features, task dependent factors, and spatial biases on overt visual attention. PLoS Computational Biology, 6, e1000791. [Article]
Krieger G. Rentschler I. Hauske G. Schill K. Zetsche C. (2000). Object and scene analysis by saccadic eye movements: An investigation with higher-order statistics. Spatial Vision, 13, 201–214. [PubMed] [CrossRef] [PubMed]
Land M. F. Lee D. (1994). Where we look when we steer. Nature, 369, 742–744. [PubMed] [CrossRef] [PubMed]
Mannan S. K. Ruddock K. H. Wooding D. S. (1996). The relationship between the locations of spatial features and those of fixations made during visual examination of briefly presented images. Spatial Vision, 10, 165–188. [PubMed] [CrossRef] [PubMed]
Mannan S. K. Ruddock K. H. Wooding D. S. (1997). Fixation patterns made during brief examination of two-dimensional images. Perception, 26, 1059–1072. [PubMed] [CrossRef] [PubMed]
McPeek R. M. Maljkovich V. Nakayama K. (1999). Saccades require focal attention and are facilitated by a short-term memory system. Vision Research, 39, 1555–1566. [PubMed] [CrossRef] [PubMed]
Navalpakkam V. Itti L. (2006). Top-down attention selection is fine grained. Journal of Vision, 6(11):4, 1180–1193, http://www.journalofvision.org/content/6/11/4, doi:10.1167/6.11.4. [PubMed] [Article] [CrossRef]
Nelson J. D. Cottrell G. W. Movellan J. R. Sereno M. I. (2004). Yarbus lives: A foveated exploration of how task influences saccadic eye movement [Abstract]. Journal of Vision, 4(8):741, 741a, http://www.journalofvision.org/content/4/8/741, doi:10.1167/4.8.741. [CrossRef]
Nyström M. Holmqvist K. (2008). Semantic override of low-level features in image viewing—Both initially and overall. Journal of Eye Movement Research, 2, 1–11. [Article]
Olmos A. Kingdom F. A. A. (2004). McGill Calibrated Colour Image Database. Available at http://tabby.vision.mcgill.ca.
Parkhurst D. Law K. Niebur E. (2002). Modeling the role of salience in the allocation of overt attention. Vision Research, 42, 107–123. [PubMed] [CrossRef] [PubMed]
Parkhurst D. Niebur E. (2003). Scene content selected by active vision. Spatial Vision, 16, 125–154. [PubMed] [CrossRef] [PubMed]
Pieters R. Warlop L. (1999). Visual attention during brand choice: The impact of time pressure and task motivation. International Journal of Research in Marketing, 16, 1–16. [CrossRef]
Posner M. I. Synder C. R. R. (1975). Attention and cognitive control. In Solso R. L. (Ed.), Information processing and cognition: The Loyola Symposium (pp. 55–85). Hillsdale, NJ: Erlbaum.
Reinagel P. Zador A. M. (1999). Natural scene statistics at the centre of gaze. Network: Computation in Neural Systems, 10, 1–10. [PubMed] [CrossRef]
Rothkopf C. A. Ballard D. H. Hayhoe M. M. (2007). Task and context determine where you look. Journal of Vision, 7(14):16, 1–20, http://www.journalofvision.org/content/7/14/16, doi:10.1167/7.14.16. [PubMed] [Article] [CrossRef] [PubMed]
Rutishauser U. Koch C. (2007). Probabilistic modeling of eye movement data during conjunction search via feature-based attention. Journal of Vision, 7(6):5, 1–20, http://www.journalofvision.org/content/7/6/5, doi:10.1167/7.6.5. [PubMed] [Article] [CrossRef] [PubMed]
Schneider E. Bartl K. Dera T. Böning G. Wagner P. Brandt T. (2006). Documentation and teaching of surgery with an eye movement driven head-mounted camera: See what the surgeon sees and does. Studies in Health Technology and Informatics, 119, 486–490. [PubMed] [PubMed]
Schuhmann F. Einhäuser-Treyer W. Vockeroth J. Bartl K. Schneider E. König P. (2008). Salient features in gaze-aligned recordings of human visual input during free exploration of natural environments. Journal of Vision, 8(14):12, 1–17, http://www.journalofvision.org/content/8/14/12, doi:10.1167/8.14.12. [PubMed] [Article] [CrossRef]
Seagull F. J. Xiao Y. (2001). Using eye-tracking video data to augment knowledge elicitation in cognitive task analysis. Human Factors and Ergonomics Society Annual Meeting Proceedings, Cognitive Engineering and Decision Making, 45, 400–403. [CrossRef]
Silver N. C. Dunlap W. P. (1987). Averaging correlation coefficients: Should Fisher's Z transformation be used? Journal of Applied Psychology, 72, 146–148. [CrossRef]
Smith C. N. Hopkins R. O. Squire L. R. (2006). Experience-dependent eye movements, awareness, and hippocampus-dependent memory. The Journal of Neuroscience, 26, 11304–11312. [Article] [CrossRef] [PubMed]
Soto D. Heinke D. Humphreys G. W. Blanco M. J. (2005). Early, involuntary top-down guidance of attention from working memory. Journal of Experimental Psychology: Human Perception and Performance, 31, 248–261. [PubMed] [CrossRef] [PubMed]
Strube M. J. (1988). Averaging correlation coefficients: Influence of heterogeneity and set size. Journal of Applied Psychology, 73, 559–568. [CrossRef]
Tabachnick B. G. Fidell L. S. (2007). Using multivariate statistics. Boston: Pearson.
Tatler B. W. (2007). The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision, 7(14):4, 1–17, http://www.journalofvision.org/content/7/14/4, doi:10.1167/7.14.4. [PubMed] [Article] [CrossRef] [PubMed]
W. B. (Ed.) (2009). Eye guidance and natural scenes. Hove, UK: Psychology Press.
Tatler B. W. Baddeley R. J. Gilchrist I. D. (2005). Visual correlates of fixation selection: Effects of scale and time. Vision Research, 45, 643–659. [PubMed] [CrossRef] [PubMed]
Tatler B. W. Baddeley R. J. Vincent B. T. (2006). The long and the short of it: Spatial statistics at fixation vary with saccade amplitude and task. Vision Research, 46, 1857–1862. [PubMed] [CrossRef] [PubMed]
Tatler B. W. Hayhoe M. M. Land M. F. Ballard D. H. (2011). Eye guidance in natural vision: Reinterpreting salience. Journal of Vision, 11(5):5, 1–23, http://www.journalofvision.org/content/11/5/5, doi:10.1167/11.5.5. [PubMed] [Article] [CrossRef] [PubMed]
Thorndike R. M. (2007). Fischer's Z transformation. In Salkind N. J. (Ed.), Encyclopedia of measurement and statistics (vol. 2 .) Thousand Oaks, CA: Sage Publications.
Torralba A. (2003). Modeling global scene factors in attention. Journal of the Optical Society of America, 20, 1407–1418. [PubMed] [CrossRef] [PubMed]
Torralba A. Oliva A. Castelhano M. S. Henderson J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Psychological Review, 113, 766–786. [PubMed] [CrossRef] [PubMed]
Treisman A. Gelade G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97–136. [PubMed] [CrossRef] [PubMed]
Triesch J. Ballard D. H. Hayhoe M. M. Sullivan B. T. (2003). What you see is what you need. Journal of Vision, 3(1):9, 86–94, http://www.journalofvision.org/content/3/1/9, doi:10.1167/3.1.9. [PubMed] [Article] [CrossRef]
Underwood G. Foulsham T. Humphrey K. (2009). Saliency and scan patterns in the inspection of real-world scenes: Eye movements during encoding and recognition. Visual Cognition, 17, 812–834. [CrossRef]
Wallis G. Bülthoff H. (2000). What's scene and not seen: Influence of movement and task upon what we see. Visual Cognition, 7, 175–190. [CrossRef]
Yantis S. Jonides J. (1984). Abrupt visual onset and selective attention: Evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 10, 601–621. [PubMed] [CrossRef] [PubMed]
Yarbus A. L. (1967). Eye movements and vision. New York: Plenum.
Zhang L. Tong M. H. Marks T. K. Shan H. Cottrell G. W. (2008). SUN: A Bayesian framework for saliency using natural statistics. Journal of Vision, 8(7):32, 1–20, http://www.journalofvision.org/content/8/7/32, doi:10.1167/8.7.32. [PubMed] [Article] [CrossRef] [PubMed]
Figure 1
 
Examples of image categories: (A) nature, (B) urban, (C) fractal, (D) pink noise.
Figure 1
 
Examples of image categories: (A) nature, (B) urban, (C) fractal, (D) pink noise.
Figure 2
 
Examples of fixation distributions and corresponding saccades of single subjects depending on image type and repeated presentation.
Figure 2
 
Examples of fixation distributions and corresponding saccades of single subjects depending on image type and repeated presentation.
Figure 3
 
Congruency of FDMs. Fisher's z-transformed correlations between FDMs of two consecutive presentation runs were averaged across subjects. Mean z-transformed correlations differing significantly from zero are indicated by asterisks [***p < 0.003 (adjusted alpha level); *p < 0.05]. Vertical lines above bars indicate standard error of measurement. Red horizontal lines within bars indicate the reference value; this is the mean correlation between two FDMs that can be expected when both FDMs are based on fixations being randomly sampled from all fixations made on images of a certain category.
Figure 3
 
Congruency of FDMs. Fisher's z-transformed correlations between FDMs of two consecutive presentation runs were averaged across subjects. Mean z-transformed correlations differing significantly from zero are indicated by asterisks [***p < 0.003 (adjusted alpha level); *p < 0.05]. Vertical lines above bars indicate standard error of measurement. Red horizontal lines within bars indicate the reference value; this is the mean correlation between two FDMs that can be expected when both FDMs are based on fixations being randomly sampled from all fixations made on images of a certain category.
Figure 4
 
Number of fixated image regions for each image category. Images were divided into a 4 × 4 grid of equal-sized regions and number of fixated regions was counted separately for each image and presentation run. The mean across images is depicted. Vertical lines indicate standard error of the mean. Significant differences between presentation runs (Bonferroni-adjusted t-tests) are marked.
Figure 4
 
Number of fixated image regions for each image category. Images were divided into a 4 × 4 grid of equal-sized regions and number of fixated regions was counted separately for each image and presentation run. The mean across images is depicted. Vertical lines indicate standard error of the mean. Significant differences between presentation runs (Bonferroni-adjusted t-tests) are marked.
Figure 5
 
Mean number of fixations on images of a certain category depending on the actual presentation run (left side) and mean number of fixations separated for left and right image sides (right side). Images were divided into a 4 × 4 grid of equal-sized regions. Vertical lines indicate standard error of the mean.
Figure 5
 
Mean number of fixations on images of a certain category depending on the actual presentation run (left side) and mean number of fixations separated for left and right image sides (right side). Images were divided into a 4 × 4 grid of equal-sized regions. Vertical lines indicate standard error of the mean.
Figure 6
 
Mean frequency of saccades on fractal images depending on saccade length. Saccades were assigned to one of 20 categories (category 1: saccade length ≤ 1° visual angle; category 2: saccade length > 1° and ≤2°; …; category 20: saccade length > 19°). Vertical lines indicate standard error of the mean.
Figure 6
 
Mean frequency of saccades on fractal images depending on saccade length. Saccades were assigned to one of 20 categories (category 1: saccade length ≤ 1° visual angle; category 2: saccade length > 1° and ≤2°; …; category 20: saccade length > 19°). Vertical lines indicate standard error of the mean.
Figure 7
 
Area under ROC region indicating the correlation between image feature and fixation likelihood separated for presentation runs. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation selection (dashed line).
Figure 7
 
Area under ROC region indicating the correlation between image feature and fixation likelihood separated for presentation runs. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation selection (dashed line).
Figure 8
 
Area under ROC values indicating the correlation between image feature and fixation likelihood separated for image categories. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 8
 
Area under ROC values indicating the correlation between image feature and fixation likelihood separated for image categories. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 9
 
The correlation between AUC values at fixated locations and the length of the preceding saccade. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Significant differences are marked by asterisks [***p < 0.002 (adjusted alpha level); **p < 0.01; *p < 0.05]. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 9
 
The correlation between AUC values at fixated locations and the length of the preceding saccade. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Significant differences are marked by asterisks [***p < 0.002 (adjusted alpha level); **p < 0.01; *p < 0.05]. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 10
 
The inter-relation between AUC values at fixated locations and the length of the preceding saccade with respect to image category. The figure exemplarily depicts the interaction between saccade length and image category for luminance contrast at high spatial resolution (lumc 1°). The interaction pattern shown is identical for luminance, red–green, and blue–yellow contrasts, as well as for edge values at all spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 10
 
The inter-relation between AUC values at fixated locations and the length of the preceding saccade with respect to image category. The figure exemplarily depicts the interaction between saccade length and image category for luminance contrast at high spatial resolution (lumc 1°). The interaction pattern shown is identical for luminance, red–green, and blue–yellow contrasts, as well as for edge values at all spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 11
 
The correlation between AUC values at fixated locations and the length of the preceding saccade based on the original feature values at the exact fixation location, on the one hand, and the maximal feature value found in a circular area with a radius of 1° visual angle around the fixation location, on the other hand. On the left side, the feature–fixation correlations for luminance contrast at 1° visual angle are depicted, while on the right side AUC values for red–green contrast at 1° are shown. Vertical lines indicate standard error of the mean. Results of pairwise comparisons are marked.
Figure 11
 
The correlation between AUC values at fixated locations and the length of the preceding saccade based on the original feature values at the exact fixation location, on the one hand, and the maximal feature value found in a circular area with a radius of 1° visual angle around the fixation location, on the other hand. On the left side, the feature–fixation correlations for luminance contrast at 1° visual angle are depicted, while on the right side AUC values for red–green contrast at 1° are shown. Vertical lines indicate standard error of the mean. Results of pairwise comparisons are marked.
Figure 12
 
Examples of image categories: (A) low complex, (B) mid complex, and (C) high complex urban scenes.
Figure 12
 
Examples of image categories: (A) low complex, (B) mid complex, and (C) high complex urban scenes.
Figure 13
 
Examples of fixations and corresponding saccades of single subjects depending on image complexity and repeated presentation.
Figure 13
 
Examples of fixations and corresponding saccades of single subjects depending on image complexity and repeated presentation.
Figure 14
 
Congruency of FDMs. Fisher's z-transformed correlations between FDMs of two consecutive presentation runs were averaged across subjects. Mean z-transformed correlations differing significantly from zero are indicated by asterisks [***p < 0.003 (adjusted alpha level); *p < 0.05]. Vertical lines above bars indicate standard error of measurement. Red horizontal lines within bars indicate the reference value. This is the mean correlation between two FDMs that can be expected when both FDMs are based on fixations being randomly sampled from all fixations made on images of a certain category.
Figure 14
 
Congruency of FDMs. Fisher's z-transformed correlations between FDMs of two consecutive presentation runs were averaged across subjects. Mean z-transformed correlations differing significantly from zero are indicated by asterisks [***p < 0.003 (adjusted alpha level); *p < 0.05]. Vertical lines above bars indicate standard error of measurement. Red horizontal lines within bars indicate the reference value. This is the mean correlation between two FDMs that can be expected when both FDMs are based on fixations being randomly sampled from all fixations made on images of a certain category.
Figure 15
 
Number of fixated image regions for each image category. Images were divided into a 4 × 4 grid of equal-sized regions and the number of fixated regions was counted separately for each image and presentation run. The mean across images is depicted. Vertical lines indicate standard error of the mean. Significant differences between presentation runs (Bonferroni-adjusted t-tests) are marked.
Figure 15
 
Number of fixated image regions for each image category. Images were divided into a 4 × 4 grid of equal-sized regions and the number of fixated regions was counted separately for each image and presentation run. The mean across images is depicted. Vertical lines indicate standard error of the mean. Significant differences between presentation runs (Bonferroni-adjusted t-tests) are marked.
Figure 16
 
Mean number of fixations on images' left and right sides in dependence of the actual presentation run (left side) and left bias of fixations depending on image complexity (right side). Vertical lines indicate standard error of the mean.
Figure 16
 
Mean number of fixations on images' left and right sides in dependence of the actual presentation run (left side) and left bias of fixations depending on image complexity (right side). Vertical lines indicate standard error of the mean.
Figure 17
 
Mean frequency of saccades on mid complex images depending on saccade length. Saccades were assigned to one of 20 categories (category 1: saccade length ≤ 1° visual angle; category 2: saccade length > 1° and ≤2°; …; category 20: saccade length > 19°). Vertical lines indicate standard error of the mean.
Figure 17
 
Mean frequency of saccades on mid complex images depending on saccade length. Saccades were assigned to one of 20 categories (category 1: saccade length ≤ 1° visual angle; category 2: saccade length > 1° and ≤2°; …; category 20: saccade length > 19°). Vertical lines indicate standard error of the mean.
Figure 18
 
Area under ROC values indicating the correlation between image feature and fixation likelihood separated for urban scenes of different complexity. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 18
 
Area under ROC values indicating the correlation between image feature and fixation likelihood separated for urban scenes of different complexity. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Vertical lines indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 19
 
The correlation between AUC values at fixated locations and the length of the preceding saccade. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Significant differences are marked by asterisks [***p < 0.002 (adjusted alpha level); **p < 0.01; *p < 0.05]. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 19
 
The correlation between AUC values at fixated locations and the length of the preceding saccade. Features are contrasts of blue–yellow (byc), red–green (rgc), and luminance (lumc), as well as second-order texture contrasts (bytc, rgtc, lumtc), edge values (sob), and saturation (sat) at different spatial resolutions. Significant differences are marked by asterisks [***p < 0.002 (adjusted alpha level); **p < 0.01; *p < 0.05]. Vertical lines above bars indicate standard error of measurement. AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 20
 
The correlation between AUC values at fixated locations and the length of the preceding saccade based on the original feature values at the exact fixation location, on the one hand, and the maximal feature value found in a circular area with a radius of 1° visual angle around the fixation location, on the other hand. On the left side, the feature–fixation correlations for luminance contrast at 1° visual angle are depicted, and on the right side, the AUC values for red–green contrast at 1° are shown. Vertical lines indicate standard error of the mean. Results of pairwise comparisons are marked.
Figure 20
 
The correlation between AUC values at fixated locations and the length of the preceding saccade based on the original feature values at the exact fixation location, on the one hand, and the maximal feature value found in a circular area with a radius of 1° visual angle around the fixation location, on the other hand. On the left side, the feature–fixation correlations for luminance contrast at 1° visual angle are depicted, and on the right side, the AUC values for red–green contrast at 1° are shown. Vertical lines indicate standard error of the mean. Results of pairwise comparisons are marked.
Figure 21
 
The inter-relation between AUC values at fixated locations and the length of the preceding saccade with respect to image category. The left figure exemplarily depicts the interaction between saccade length and image complexity for red–green contrast at mid spatial resolution (rgc 2°). The interaction pattern shown is identical for luminance, red–green, and blue–yellow contrasts, edge values, and saturation at all spatial resolutions. The right figure exemplarily depicts the interaction that is similar for all second-order texture contrasts at all spatial resolutions (bytc, rgtc, and lumtc at 2°, 4°, and 6°). Vertical lines indicate standard error of measurement. The AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 21
 
The inter-relation between AUC values at fixated locations and the length of the preceding saccade with respect to image category. The left figure exemplarily depicts the interaction between saccade length and image complexity for red–green contrast at mid spatial resolution (rgc 2°). The interaction pattern shown is identical for luminance, red–green, and blue–yellow contrasts, edge values, and saturation at all spatial resolutions. The right figure exemplarily depicts the interaction that is similar for all second-order texture contrasts at all spatial resolutions (bytc, rgtc, and lumtc at 2°, 4°, and 6°). Vertical lines indicate standard error of measurement. The AUC value of 0.5 indicates that a feature is uncorrelated with fixation behavior (dashed line).
Figure 22
 
AUC values' dependency on actual presentation run and between-subject saccade length (left side) illustrated by luminance texture contrast at high spatial resolution (lumtc 1°). Subjects who showed an average saccade length above the median (saccade group = long) are opposed to subjects whose mean saccade length was below the median (saccade group = short). The shown reversion of the between-group differences in AUC values across repeated presentations is representative for all image features. Right side: Mean saccade length of saccade groups separated for presentation runs.
Figure 22
 
AUC values' dependency on actual presentation run and between-subject saccade length (left side) illustrated by luminance texture contrast at high spatial resolution (lumtc 1°). Subjects who showed an average saccade length above the median (saccade group = long) are opposed to subjects whose mean saccade length was below the median (saccade group = short). The shown reversion of the between-group differences in AUC values across repeated presentations is representative for all image features. Right side: Mean saccade length of saccade groups separated for presentation runs.
Table 1
 
Matrix of mean Fischer's z-transformed intercorrelations (averaged across subjects) between FDMs of all presentation runs. Fisher's z-transformed correlations are signatures of the congruency of two FDMs. Correlations between two consecutive presentations are marked in bold.
Table 1
 
Matrix of mean Fischer's z-transformed intercorrelations (averaged across subjects) between FDMs of all presentation runs. Fisher's z-transformed correlations are signatures of the congruency of two FDMs. Correlations between two consecutive presentations are marked in bold.
Presentation Nature scenes Urban scenes Fractal scenes Pink noise scenes
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
1 1.0 1.0 1.0 1.0
2 0.53 1.0 0.55 1.0 0.58 1.0 0.46 1.0
3 0.46 0.47 1.0 0.41 0.43 1.0 0.48 0.48 1.0 0.37 0.62 1.0
4 0.40 0.41 0.41 1.0 0.34 0.34 0.35 1.0 0.43 0.43 0.42 1.0 0.33 0.47 0.61 1.0
5 0.36 0.38 0.37 0.42 1.0 0.32 0.31 0.33 0.34 1.0 0.40 0.45 0.41 0.45 1.0 0.34 0.77 0.53 0.49 1.0
Table 2
 
Matrix of mean Fischer's z-transformed intercorrelations (averaged across subjects) between FDMs of all presentation runs. Fisher's z-transformed correlations are signatures of the congruency of two FDMs. Correlations between two consecutive presentations are marked in bold.
Table 2
 
Matrix of mean Fischer's z-transformed intercorrelations (averaged across subjects) between FDMs of all presentation runs. Fisher's z-transformed correlations are signatures of the congruency of two FDMs. Correlations between two consecutive presentations are marked in bold.
Presentation Low complex scenes Mid complex scenes High complex scenes
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
1 1.0 1.0 1.0
2 0.26 1.0 0.32 1.0 0.26 1.0
3 0.23 0.26 1.0 0.26 0.25 1.0 0.21 0.22 1.0
4 0.21 0.25 0.27 1.0 0.25 0.25 0.24 1.0 0.17 0.17 0.18 1.0
5 0.19 0.23 0.23 0.27 1.0 0.21 0.21 0.22 0.23 1.0 0.17 0.17 0.17 0.17 1.0
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×