Analysis of the statistics of natural scene features at observers' fixations can help us understand the mechanism of fixation selection and visual attention of the human vision system. Previous studies revealed that several low-level luminance features at fixations are statistically different from those at randomly selected locations. In our study, we conducted eye tracking experiments on naturalistic stereo images presented through a haploscope and found that fixated luminance contrast and luminance gradient are generally higher than randomly selected luminance contrast and luminance gradient, which agrees with previous literature, but the fixated disparity contrast and disparity gradient are generally lower than randomly selected disparity contrast and disparity gradient. We discuss the relevance of our findings in the context of the complexity of disparity calculations and the metabolic needs of disparity processing.

*f*noise (Rajashekar, Bovik, & Cormack, 2006), observers were reported to fixate on regions with structures similar to the target's shape. Along similar lines, Najemnik and Geisler (2005) and also Raj, Geisler, Frazor, and Bovik (2005) showed that low-level visual fixations can be viewed as an information gathering process.

*lower*than that at random patches on natural images.

*S*

_{ AB }denotes inner products between left/right eye images and even/odd responses of simple cell receptive fields, where

*A*∈ {

*L, R*} indicates left or right eye and

*B*∈ {

*E, O*} indicates even or odd symmetry. The responses of even symmetric and odd symmetric simple binocular cells (quadrature pairs) are denoted by

*r*

_{1}and

*r*

_{2}, respectively. In Equation 3, the last two terms

*S*

_{ LE }

*S*

_{ RE }and

*S*

_{ LO }

*S*

_{ RO }are the cross-correlation between band-pass left image and right image. In the model, the outputs of a binocular complex neuron largely depend on this term. Aside from neurophysiological evidence, psychophysical studies (Cormack, Stevenson, & Schor, 1991) also showed that interocular correlation is a decisive factor in stereopsis. Furthermore, local correlations have been extensively used to resolve correspondences in numerous computational stereo algorithms too (for a good review, see Brown, Burschka, & Hager, 2003; Scharstein & Szeliski, 2002). Inspired by this neurobiological basis of stereopsis, Filippini and Banks (2009) built a local correlation stereopsis model. They conducted psychophysical experiments with human observers and compared human results with the results of their model under the same experiment setup. Using this model, they explained two well-known constraints of human stereopsis: the disparity-gradient limit, which is the inability to perceive depth when the change in disparity within a region is too large, and the limit of stereoresolution, which is the inability to perceive spatial variations in disparity that occur at too fine a spatial scale.

*x*

_{r},

*y*

_{r}) in the right image, we defined a search window centered on the same pixel location (preferring zero disparity) in the left image (Figure 2). The width of the search window in the left image was 161 pixels (3.2°), and the height of the window was 5 pixels (0.1°). Given a 1° × 1° patch in the right image centered on the pixel (

*x*

_{r},

*y*

_{r}), the algorithm computes the normalized cross-correlation between the right patch and a candidate 1° × 1° left patch centered on each pixel in the 161 × 5 search window, centered at (

*x*

_{r},

*y*

_{r}) in the left image using the following equation:

*I*

_{l}and

*I*

_{r}are the left and right images, while

*μ*

_{l}and

*μ*

_{r}are the mean luminance values of the left and right patches. The normalized interocular correlation

*C*(

*d, s*) always takes a value between −1 and 1, where 1 means the two patches are almost identical up to multiplicative scaling, and −1 means the two patches have reversed luminance profiles (bright locations in the left are dark in the right).

*x*

_{l},

*y*

_{l}) that it centered on was then the matched pixel for the right pixel (

*x*

_{r},

*y*

_{r}), and the horizontal disparity at pixel (

*x*

_{r},

*y*

_{r}) was taken to be

*D*(

*x*

_{r},

*y*

_{r}) =

*x*

_{r}−

*x*

_{l}. While this disparity is not the conventional angular disparity defined in unit of degrees, it represents an angular difference given an assumed viewing geometry. The algorithm computed a disparity for every pixel in the right image, yielding a dense disparity map

*D*. Naturally, we do not claim that this simple algorithm duplicates the disparity processing of the large population of neurons dedicated to the task. Nevertheless, it is an effective method and appears to be compatible with significant aspects of human stereopsis (Filippini & Banks, 2009).

*I,*and the dense disparity map as

*D*. We compute the luminance gradient map

*X*). Here we define luminance contrast as the RMS contrast of a luminance patch

*f*

_{ i }fixations on the

*i*th image. Then, the total number of fixations that the observer made during a session is Σ

_{ i }

^{48}

*f*

_{ i }. We assume that a random observer also made the same number of fixations as the subject did. That is, for the

*i*th image, the random observer also selected

*f*

_{ i }fixations uniformly distributed on the image plane. For each human observer, we assume that there are 100 random observers, each making the same number of fixations on each image as does the human observer. For example, if the overall number of fixations that subject LKC made was 486 fixations on 48 images, then each random observer selected 486 random locations as well. The total number of random locations is 48,600. We want to know whether or not there is a statistically significant difference between image features at fixations and those at randomly selected locations by comparing the human observer data and the 100 random observers' data.

*RC*

_{l}=

*C*

_{l}

^{ i }/

*C*

_{rl}

^{ i }and the fixation-to-random luminance gradient ratio

*RG*

_{l}=

*G*

_{l}

^{ i }/

*G*

_{rl}

^{ i }. If

*RC*

_{l}> 1, it means that the fixated patches generally have a larger luminance contrast than randomly selected patches on the image being considered. If

*RC*

_{l}< 1, then the meaning is reversed. The same meaning applies to the luminance gradient ratio.

*C*

_{d}

^{ i }on the fixated patches, and the same quantity

*C*

_{rd}

^{ i }on the randomly selected patches. The ratio of disparity contrast between the fixated patches and the randomly selected patches is defined as

*RC*

_{d}=

*C*

_{d}

^{ i }/

*C*

_{rd}

^{ i }.

*G*

_{d}

^{ i }) and the randomly selected patches (

*G*

_{rd}

^{ i }) is also calculated. The ratio of the disparity gradient between the fixated patches and the random patches is defined as

*RG*

_{d}=

*G*

_{d}

^{ i }/

*G*

_{rd}

^{ i }. As before, if the ratios are significantly greater than 1, then fixated patches tend to have a larger disparity contrast and gradient than randomly picked locations.

*all*patch sizes. This means that the fixated patches generally had a smaller disparity contrast and gradient than the random locations. The results of other observers all proved to be similar to that for LKC, as shown in Figures 8b (CHY) and 8c (JSL).

_{l}with 95% CIs and the mean disparity gradient ratio

_{d}with 95% CIs, for all subjects as shown in Figure 9a. For better comparison, we plot 1 as a straight horizontal line across all patch sizes. The red curves show the ratios of luminance gradient, and the blue curves showed the ratios of disparity gradients. Different markers are used to represent the observers: LKC (*), CHY (○), JSL (Δ). We made a similar ensemble comparison plot for the mean luminance contrast ratio and mean disparity contrast ratio, as displayed in Figure 9b.

*i,*we randomly select one image from the whole database. Overlapping the fixations onto the randomly picked image, we computed luminance contrast, luminance gradient, disparity contrast, and disparity gradient of the fixations and those of the “fixations” on the random image. We ran the random picking from the image-shuffled database 100 times.

*i*:

*C*

_{l}

^{ i },

*G*

_{l}

^{ i },

*C*

_{d}

^{ i }, and

*G*

_{d}

^{ i }. We denote the four features from a randomly picked image

*j*as

*C*

_{rl}

^{ J },

*G*

_{rl}

^{ J },

*C*

_{rd}

^{ J }, and

*G*

_{rd}

^{ J }, where all the four scene features were computed by selecting image patches at the overlapped fixations on image

*j*.