Free
Article  |   July 2014
Local masking in natural images: A database and analysis
Author Affiliations
Journal of Vision July 2014, Vol.14, 22. doi:10.1167/14.8.22
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Md Mushfiqul Alam, Kedarnath P. Vilankar, David J. Field, Damon M. Chandler; Local masking in natural images: A database and analysis. Journal of Vision 2014;14(8):22. doi: 10.1167/14.8.22.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract
Abstract
Abstract:

Abstract  Studies of visual masking have provided a wide range of important insights into the processes involved in visual coding. However, very few of these studies have employed natural scenes as masks. Little is known on how the particular features found in natural scenes affect visual detection thresholds and how the results obtained using unnatural masks relate to the results obtained using natural masks. To address this issue, this paper describes a psychophysical study designed to obtain local contrast detection thresholds for a database of natural images. Via a three-alternative forced-choice experiment, we measured thresholds for detecting 3.7 cycles/° vertically oriented log-Gabor noise targets placed within an 85 × 85-pixels patch (1.9° patch) drawn from 30 natural images from the CSIQ image database (Larson & Chandler, Journal of Electronic Imaging, 2010). Thus, for each image, we obtained a masking map in which each entry in the map denotes the root mean squared contrast threshold for detecting the log-Gabor noise target at the corresponding spatial location in the image. From qualitative observations we found that detection thresholds were affected by several patch properties such as visual complexity, fineness of textures, sharpness, and overall luminance. Our quantitative analysis shows that except for the sharpness measure (correlation coefficient of 0.7), the other tested low-level mask features showed a weak correlation (correlation coefficients less than or equal to 0.52) with the detection thresholds. Furthermore, we evaluated the performance of a computational contrast gain control model that performed fairly well with an average correlation coefficient of 0.79 in predicting the local contrast detection thresholds. We also describe specific choices of parameters for the gain control model. The objective of this database is to provide researchers with a large ground-truth dataset in order to further investigate the properties of the human visual system using natural masks.

Introduction
Considerable insights into human visual perception have been gained by investigating how performance on various visual tasks (e.g., visual detection and discrimination) change in the presence of masking signals. Over the last several decades, such studies on visual masking have revealed numerous important aspects of visual processing. For example, masking has been used to estimate visual channel bandwidths (e.g., Sachs, Nachmias, & Robson, 1971; Stromeyer & Julesz, 1972; Pantle, 1974; Legge & Foley, 1980; Watson, 1982; Wilson, McFarlane, & Phillips, 1983), channel interactions (e.g., Carter & Henning, 1971; Stromeyer & Julesz, 1972; Henning, Hertz, & Hinton, 1981), neuronal orientation-selectivity (e.g., Campbell & Kulikowski, 1966; Wilson et al., 1983; Phillips & Wilson, 1984), nonlinear responses of visual neurons (e.g., Legge & Foley, 1980; Foley, 1994; Watson & Solomon, 1997), and a number of others (see DeValois & DeValois, 1990; Breitmeyer & Ogmen, 2006, for reviews). The results of such studies have been particularly useful for image-processing applications such as image compression (e.g., Nadenau & Reichel, 2000; Chandler, Dykes, & Hemami, 2005), watermarking (e.g., Kutter & Winkler, 2002; Koz & Alatan, 2008), image quality assessment (e.g., S. J. Daly, 1993; Watson, Borthwick, & Taylor, 1997; Chandler & Hemami, 2007; Bovik, 2013), and texture synthesis (e.g., Heeger & Bergen, 1995; Walter, Pattanaik, & Greenberg, 2002), in which the image serves as the mask, and the processing artifacts serve as the target of detection. 
Traditionally, masking studies have employed unnatural masks such as sine-wave gratings (e.g., Legge & Foley, 1980; Foley & Legge, 1981; Wilson et al., 1983; Foley, 1994; Foley & Boynton, 1994), visual noise (e.g., Carter & Henning, 1971; Stromeyer & Julesz, 1972; Henning et al., 1981), checkerboards (e.g., Pashler, 1988), and Gabor patterns (e.g., Foley, 1994). The main advantage of these artificial masks is that they have well-defined features and parameters, which allows one to investigate the effects of specific mask properties on the detection thresholds. However, due to the inherent nonlinear properties of visual processing, it remains unclear whether the results obtained using such highly controlled masks can be used to predict the results obtained using natural scenes as masks. 
Contrast gain control models have been widely used in predicting and explaining the effects of masking by unnatural masks (e.g., Legge & Foley, 1980; Geisler & Albrecht, 1992; Heeger, 1992; Wilson & Humanski, 1993; Carandini & Heeger, 1994; Foley, 1994; Teo & Heeger, 1994; Lubin, 1995; Lu & Sperling, 1996; Eckstein & Watson, 1997; Watson & Solomon, 1997; Holmes & Meese, 2004; Meese & Holmes, 2007; Meese, Challinor, & Summers, 2008; Meese & Holmes, 2010). These models have been very successful in predicting contrast detection thresholds for pattern masks, such as sine-wave gratings. However, recent studies have suggested the need for various modifications to these models, specifically in the formation and combination of the suppressive gain control pool (Holmes & Meese, 2004; Meese & Holmes, 2007; Meese et al., 2008; Meese & Holmes, 2010). We will return to these issues in the Discussion section. 
Psychophysical studies with natural scenes have demonstrated that the human visual system is highly efficient at representing the statistical structure of the natural environment. Researchers have explored the relative efficiency in coding natural contrast distributions (e.g., Webster & Miyahara, 1997; Brady & Field, 2000; Schwartz & Simoncelli, 2001; Mante, Frazor, Bonin, Geisler, & Carandini, 2005), natural forms of curvature (e.g., Field, Hayes, & Hess, 1993; Geisler, Perry, Super, & Gallogly, 2001; Gilbert, Sigman, & Crist, 2001), the 1/f2 power spectrum (e.g., Knill, Field, & Kersten, 1990; Tolhurst & Tadmor, 1997, 2000; Párraga, Troscianko, & Tolhurst, 2000), natural depth cues (e.g., Howe & Purves, 2002; Burge, Fowlkes, & Banks, 2010), and a variety of other spatial, temporal, and chromatic dimensions of the stimulus. As far as this paper is concerned, we emphasize these efficiencies because we believe they are likely to be relevant to our understanding of masking. By probing the visual system with natural masks, we allow the visual system to use its statistical understanding of natural scenes to perform at peak performance. Studies with unnatural masks may never see these perceptual strategies in action because they are never tested. 
A wide variety of image-processing applications are dependent on the visual masking that occurs in the presence of natural scenes. Typically, these studies are interested in determining whether a particular alteration in a scene (e.g., a quantization error) is detectable to a human observer. To make accurate predictions on thresholds for these alterations, ideally one would like a map of how different regions of the image will affect these alterations. Some work has been directed along these lines. In studies on compression, masking maps have been proposed to guide the quantization scheme in which fewer bits are allocated to the regions with greater detection thresholds (e.g., Nadenau & Reichel, 2000; Zheng, Daly, & Lei, 2000; Chandler, Gaubatz, & Hemami, 2009). For watermarking, the masking maps have been used to determine the local regions in images where the watermarks can be embedded (e.g., Cox & Miller, 1997; Huang & Shi, 1998; Hannigan, Reed, & Bradley, 2001; Kutter & Winkler, 2002; M. Masry, Chandler, & Hemami, 2003; Karybali & Berberidis, 2006; Koz & Alatan, 2008). For image quality assessment, masking maps have been used to estimate the relative visibility of distortions at each spatial location of an image (e.g., S. J. Daly, 1993; Heeger & Teo, 1995; Watson et al., 1997; Chandler & Hemami, 2007; Ninassi et al., 2008; Aydin, Čadík, Myszkowski, & Seidel, 2010; Larson & Chandler, 2010). Similarly, in computer graphics and texture synthesis, masking maps have been used to guide where to hide visual artifacts in the synthesized images (e.g., Heeger & Bergen, 1995; Ferwerda, Shirley, Pattanaik, & Greenberg, 1997; Ashikhmin, 2001; Walter et al., 2002). 
Despite the potential usefulness of masking maps, there currently exists no large dataset of local contrast detection thresholds for natural scenes. Several studies have used natural scenes as masks (Caelli & Moraglia, 1986; Eckstein & Watson, 1997; Watson et al., 1997; Kutter & Winkler, 2002; Chandler & Hemami, 2003). However, these studies either employed only a limited number of images, or the thresholds were limited to select spatial locations within images (Nadenau, Reichel, & Kunt, 2002; Winkler & Susstrünk, 2004; Chandler et al., 2009). 
Caelli and Moraglia (1986) for example, conducted a psychophysical detection experiment in which they calculated the detection and recognition probabilities of 2-D Gabor targets placed within a total of 24 spatial locations of two natural images. They concluded that structural-similarity-based cross correlation rather than matched-filter-based direct correlation better explained the observers' behavior in the detection and recognition tasks in natural images. However, it is not clear whether the results with these two images would generalize to a larger dataset. 
Eckstein, Ahumada, and Watson (1997) conducted a detection experiment in which subjects detected a medical-imaging object (lesion) as a target placed in three types of background conditions: (a) uniform background, (b) a repeated sample of a structured background, and (c) different samples of a structured background. The detection performance degraded from the uniform to repeated-sample conditions and degraded further in different-sample condition. Thus, both contrast gain control and random variations of the background degraded human performance in detection of a signal in a complex, spatially varying background. 
Watson et al. (1997) used a Gabor target on different mask conditions: unmasked, cosine-grating, one natural image, and several variations of noise (in each of the noise conditions the noise sample was either fixed or random in intervals and trials and was sampled from band pass or white noise). For some of the conditions, Watson et al. argued that the threshold elevation occurred because the mask was difficult to learn (which they defined as “entropy masking”), rather than due to the traditional contrast or noise masking. 
One of the more ambitious efforts to model the responses of the human visual system under a variety of conditions comes from the studies using the Modelfest dataset (Carney et al., 2000). Using data from 10 different labs, contrast detection thresholds were collected for 43 stimuli and 16 observers (Carney et al., 2000). The Modelfest data provides an excellent source for calibrating and testing various spatial vision models. However, the database included only one natural image as a target among the 43 stimuli, and no spatial masking was tested. 
Kutter and Winkler (2002) conducted a detection experiment on two natural images to find the optimum parameters of a watermarking model. The optimum parameters based on the psychophysical experiment helped to guide the insertion of high-energy watermarks while maintaining low visual distortions in the images. 
Chandler and Hemami (2003) measured thresholds for detecting wavelet targets spread quasi-uniformly (via quantization of wavelet subbands) in an unmasked paradigm and against two natural image masks. Chandler and Hemami used both simple (quantization in a single wavelet subband) and compound distortions (quantization in multiple wavelet subbands) as targets. The unmasked detection thresholds for simple and compound distortions were similar to the traditional thresholds obtained for 1-octave Gabor and sine-wave gratings. However, the detection thresholds using natural image masks with simple distortions were elevated mostly in the low-frequency range. The detection thresholds using natural image masks with compound distortions revealed higher relative sensitivities compared to the unmasked condition, suggesting greater summation for the natural images. 
More recent studies have tested a larger collection of images, but the thresholds were measured only for select spatial locations within the images. For example, Winkler and Susstrünk (2004) measured detection thresholds for Gaussian white noise and band-pass noise targets against 30 natural images. However, the noise was enveloped and perceptually visible only in the center region of the whole image. Winkler and Susstrünk found that detection ability was higher in textureless regions and decreased with increasing image activity (texture in the images). Winkler and Susstrünk emphasized that additional experiments are required with more natural image masks and variable target patterns. 
Nadenau et al. (2002) conducted a detection experiment in which the stimuli were generated using 12 levels of quantization error in 30 natural images. Nadenau et al. generated the target by quantizing the horizontal wavelet subbands of the images. Thus, the distortions were visible in some of the local regions of the images. In addition, they compared five masking models (Watson & Solomon, 1997; S. J. Daly, Zeng, Li, & Lei, 2000; Nadenau & Reichel, 2000; Nadenau et al., 2002) in terms of prediction errors and model parameter stability. Nadenau et al. showed that a plain mean-squared-error (MSE) model performed poorly compared to visual-channel-based masking models, and the model performance improved when local activity (active or homogenous neighborhood) was considered instead of using only a point-wise contrast masking effect. 
More recently, Chandler et al. (2009) measured detection thresholds for wavelet targets placed within 14 small natural image patches. For analysis, the authors divided the patches into three categories (texture, structures, and edges) depending on the visual complexity. The edge-category patches yielded the lowest thresholds, the structure-category patches yielded medium thresholds, and the texture-category patches yielded the highest thresholds. In addition, Chandler et al. demonstrated the utility of the categorization-based approach via an adaptive image compression algorithm that employed the patch-wise classification. 
However, despite this recent trend in using natural image masks, none of these studies have used a larger database of natural scenes and have collected local thresholds across each image (i.e., to generate a masking map for each image). An extensive set of such local detection thresholds would serve as crucial ground-truth data for further investigating and better modeling how local image content affects visual masking thresholds. To address this issue, in this paper we present the results of a psychophysical spatial-three-alternative forced-choice experiment in which we measured thresholds for detecting log-Gabor noise targets placed within each 85 × 85-pixels (1.9°) patch of 30 natural images from the CSIQ image database (Larson & Chandler, 2010). Thus, for each mask image, we obtained a masking map in which each entry in the map denotes the contrast detection threshold for detecting the log-Gabor noise target at the corresponding spatial location in the image. 
In this paper, we first describe the psychophysical procedures used to collect the thresholds. We then provide both qualitative and quantitative analyses of the results. Finally, we provide outcomes of predicting the thresholds via basic low-level image features and via a computational masking model. The goal here is not to revive all of the existing models of masking. Instead, our goal is to provide researchers with a large ground-truth dataset on which to train and test current and forthcoming models of masking on natural images. 
We acknowledge that the targets of our study are limited to one orientation (vertical) and one center frequency (center radial frequency around 3.7 cycles/°). However, it is important to emphasize that an experiment of this type is extremely time-consuming. We used a total of 1,080 natural image patches (36 patches per image × 30 mask images) for which the contrast detection thresholds had to be measured with multiple estimates from multiple subjects. Thus, testing just one center frequency and one orientation of the target was quite prolonged. Despite these limitations, we believe our study represents an important first step towards a better understanding of masking in natural images. 
Methods
This section provides details of the experimental apparatus, stimuli, procedure, and subjects employed to measure the local detection thresholds in the natural images. Note that all of the masks, targets, and detection thresholds are freely available in our http://vision.okstate.edu/masking/ online database. 
Apparatus
Stimuli were displayed on a Dell Trinitron (P1130) CRT monitor. The monitor was driven by a Bits++ device (Cambridge Research Systems, Rochester, UK) that provided 14-bit luminance resolution. The Bits++ device was driven by a computer running Matlab (R2009a) and PsychToolbox 3.0. The computer was equipped with an AMD Athlon 64 Processor 3700+, 2.19 GHz, and 1.5 GB of RAM with an NVIDIA GeForce 8600 GT graphics card. The screen dimensions were 19.1 inches diagonally, 15.3 inches horizontally, and 11.5 inches vertically. The display resolution was 1600 × 1200 pixels or 41.23 pixels/cm at a frame rate of 85 Hz. The total subtended angle of the display was 36.84°. The maximum possible radial frequency was 22.32 cycles/° (c/°). The minimum, maximum, and mean luminances of the display were 1.42, 105, and 36.4 cd/m2, respectively. The relationship between digital pixel value and displayed luminance was linearized in software by using a lookup table with luminance measurements made via a Konica Minolta Chroma Meter (CS-100A). The subjects viewed the stimuli binocularly through natural pupils in a darkened room at a distance of approximately 60 cm. 
Stimuli
Stimuli were composed of a mask and a target. The masks were created from the 30 images in the CSIQ database (Larson & Chandler, 2010). This particular database was chosen because it has been used in previous image-quality assessment tasks (e.g., Winkler, 2012) and it represents the kind of images that are used for practical image processing applications. It should be noted that each of these images was normalized to span an 8-bit digital range of 0–255, which produced sharp, high-contrast images that are useful for image-quality assessment, but this also indicates that the contrasts were not necessarily identical to the original scenes. For the target, we used a vertically oriented log-Gabor noise pattern with a center radial frequency of 3.7 c/° and 1-octave bandwidth. (Note that the target presented here is just one random noise sample. However, for some of the mask images, we tested with different noise samples, and the results were nearly identical.) To generate the log-Gabor noise target, first a 510 × 510-pixel random noise image In was sampled from the standard uniform distribution [0, 1]. The noise image In was then filtered using a log-Gabor filter having the following 2-D frequency response:   where the parameters r =Display FormulaImage not available are the normalized radial frequency, θ = arctan(y/x) is the orientation, and x, y denote the spatial locations within the image. Here, M × N = 510 × 510, x = 1, …, 510, and y = 1, …, 510. The parameter rs denotes the normalized center radial frequency, and σs/rs determines the radial frequency bandwidth. The parameter μo is the center orientation of the filter and σo is the angular spread. The values of these parameters were chosen as rs = 0.16, σs/rs = 0.65, μo = π/2, and σo = π/9, which resulted in a vertically oriented log-Gabor filter with center radial frequency of 3.7 c/° and 1-octave bandwidth. The filtering operation was performed via a single Discrete Fourier Transform (DFT)/inverse DFT:   where (x,y) = G( Display FormulaImage not available , arctan(y/x)), In is the noise image, 𝓕 denotes the DFT, and 𝓕−1 denotes the inverse DFT. The inverse DFT generated complex Display FormulaImage not available having dimension M × N. We denote the sine-wave component (imaginary values) of Display FormulaImage not available by Display FormulaImage not available. To generate the log-Gabor noise target, Display FormulaImage not available was then normalized to have zero mean with the range of +1 to −1 pixel intensities via,  where Display FormulaImage not available represents the normalized log-Gabor noise target and where Display FormulaImage not available represents the mean pixel intensity of Display FormulaImage not available
Each of the mask images (Display FormulaImage not available) and the log-Gabor noise target (Display FormulaImage not available) was of size 510 × 510 pixels. For measuring the local detection thresholds, each of the mask images (Display FormulaImage not available) was divided into 36 patches (Display FormulaImage not availablek, k = 1, …, 36) of size 85 × 85 pixels (1.9 degrees). The log-Gabor noise target (Display FormulaImage not available) was also divided into 36 corresponding patches (Display FormulaImage not availablek k = 1, …, 36) of size 85 × 85 pixels (1.9°). The stimuli (Ŝk) were generated by adding a mask patch (Display FormulaImage not availablek) with the corresponding target patch (Display FormulaImage not availablek) as  where k = 1, …, 36, x = 1, …, 85, and y = 1, …, 85. It should be noted that the log-Gabor noise target (Display FormulaImage not available) had a zero mean and a symmetric distribution around that zero mean (a skewness of approximately 0.005). Similarly, each of the 36 target patches (Display FormulaImage not available) had fairly symmetric distributions around zero means (average skewness of 0.0058 ± 0.094). 
To better simulate the spatially localized condition, the stimuli were additionally padded with 64 pixels (1.43°) of context from the mask image. The angle subtended by the stimuli was 4.76° with the context and 1.9° without the context. Figure 1a and Figure 1b show a mask image and the log-Gabor noise target, respectively. The left two stimuli of Figure 1c show the mask alone and the right stimulus shows the mask plus target. These three images of Figure 1c are examples of the three choices of the three-alternative forced-choice procedure. To reduce edge effects, before adding to a mask patch (Display FormulaImage not availablek), the target patch (Display FormulaImage not availablek) was multiplied with a circular window (1.9° or 85 × 85 pixels) given by,   where = Display FormulaImage not available , β = 3, γ = 10, û = −42.5, −41.5, …, 41.5, and = −42.5, −41.5, .…, 41.5. Similarly, the padded stimuli () were gradually alpha-blended with the background luminance (14 cd/m2) via  where S is the windowed padded stimuli, and where Γ was a digital value of 106.7, yielding a background luminance of 14 cd/m2. The circular window (4.76° or 149 × 149 pixels) was generated via Equation 5 with the parameters β = 3, γ = 30, û = −74.5, −73.5, …, 73.5, and = −74.5, −73.5, …, 73.5. 
Figure 1
 
Illustration of the procedure used to generate the stimuli and the placement of the stimuli within our spatial three-alternative forced-choice setup: (a) original log_seaside image (mask), (b) vertically oriented log-Gabor noise target having a center radial frequency of 3.7 c/° with 1-octave bandwidth, (c) stimuli shown in the experiment. The left two stimuli in (c) show the mask alone, and the right stimulus shows the mask plus target for the patches shown in (a) and (b) as indicated by the boxes. The shown stimuli are with the context of width 1.43°. The total angle subtended by the stimuli was 4.76° with context. Note that the target patch shown within the red box in (b) is one of the 36 target patches. For a specific spatial location, the target patch is the same for all of the 30 mask images. However, the target patch is different in different spatial locations. The log-Gabor noise target along with the mask images and detection thresholds of the experiment are available in our http://vision.okstate.edu/masking/ online database.
Figure 1
 
Illustration of the procedure used to generate the stimuli and the placement of the stimuli within our spatial three-alternative forced-choice setup: (a) original log_seaside image (mask), (b) vertically oriented log-Gabor noise target having a center radial frequency of 3.7 c/° with 1-octave bandwidth, (c) stimuli shown in the experiment. The left two stimuli in (c) show the mask alone, and the right stimulus shows the mask plus target for the patches shown in (a) and (b) as indicated by the boxes. The shown stimuli are with the context of width 1.43°. The total angle subtended by the stimuli was 4.76° with context. Note that the target patch shown within the red box in (b) is one of the 36 target patches. For a specific spatial location, the target patch is the same for all of the 30 mask images. However, the target patch is different in different spatial locations. The log-Gabor noise target along with the mask images and detection thresholds of the experiment are available in our http://vision.okstate.edu/masking/ online database.
Procedure
Thresholds were measured by using a spatial three-alternative forced-choice procedure. In each trial, subjects simultaneously viewed three adjacent images placed on a uniform 14 cd/m2 background. Two of the images contained the mask alone, and one contained the mask plus target. Subjects indicated by means of keyboard input which of the three images contained the target. If the choice was incorrect (target undetectable), the contrast of the target was increased; if the choice was correct (target detectable), the contrast of the target was decreased. For each of the 36 patches, this process repeated for 40 trials. 
Target contrasts were controlled via a QUEST staircase procedure (Pelli, 1987) using software derived from the PsychToolbox. During each trial, an auditory tone indicated the stimulus onset. After the response of each trial, auditory feedback was provided to distinguish between correct and the incorrect responses. Subjects were instructed to examine all three choices before responding. Contrast threshold was defined as the 75% correct point on a Weibull function, which was fitted to the data following each series of trials. Figure 2 shows the temporal parameters. The response time was limited to within 5 s of stimulus onset, during which all three images remained visible. The duration between two trials was 1 s, which was about six times longer than the temporal impulse response (Georgeson, 1987; Graham, 1989). Although our fairly long viewing time might have caused difficulties related to negative afterimages (Georgeson, 1984; Georgeson & Turner, 1985), the long duration better simulated the natural viewing condition. Furthermore, due to the mandatory eye movements during the long viewing time, the temporal onset and offset durations were irrelevant to the task of our experiment. 
Figure 2
 
Temporal parameters of the experiment. Response time was limited to within 5 s of stimulus onset, during which all three choices remained visible. The time duration between stimuli disappearance and stimuli appearance was one second when subjects viewed only the 14 cd/m2 gray background.
Figure 2
 
Temporal parameters of the experiment. Response time was limited to within 5 s of stimulus onset, during which all three choices remained visible. The time duration between stimuli disappearance and stimuli appearance was one second when subjects viewed only the 14 cd/m2 gray background.
Subjects
Nine adults (MA, JE, YZ, PS, PV, KV, TN, AR, and TP) including the first two authors (MA, KV) participated in the experiment. All subjects had normal or corrected-to-normal visual acuity. All subjects were experienced with detection experiments. The Oklahoma State University Institutional Review Board approved the protocol for this experiment. 
Each of the 30 mask images was observed by three subjects. For every mask image each subject was tested at 36 locations using the small patch stimuli as described previously. For each patch stimulus, the experiment was performed twice (this will hereafter be referred to as two runs of the experiment), which yielded two contrast detection threshold estimates per patch, and overall six contrast detection threshold estimates from three subjects per patch. 
For every mask image and for each of the 36 regions within that image we collected results from three different subjects. In the following sections of this paper, we denote the three subjects as Subject1, Subject2, and Subject3. Subject MA was common in all 30 mask images, and thus, detection thresholds of Subject1 refer to the detection thresholds of only MA. The detection thresholds of Subject2 and Subject3 refer to the thresholds of more than one subject. Table 1 shows the distribution of the individual subjects as Subject1, Subject2, and Subject3 for the 30 mask images. As Subject2, JE viewed 50% of the 30 mask images, YZ viewed 25% of the images, and the other five subjects viewed the remaining 25% of the images. As Subject3, each of the subjects PS, PV, KV, AR, YZ, and TN viewed approximately 16% of the images. 
Table 1
 
Distribution of the individual subjects as Subject1, Subject2, and Subject3 for the 30 mask images. The first row shows the names of the subjects. The second to third rows show the distribution of individual subjects. The last row shows the total number of images viewed by the individual subjects.
Table 1
 
Distribution of the individual subjects as Subject1, Subject2, and Subject3 for the 30 mask images. The first row shows the names of the subjects. The second to third rows show the distribution of individual subjects. The last row shows the total number of images viewed by the individual subjects.
MA JE YZ PS PV KV TN AR TP Total
Subject1 30 - - - - - - - - 30
Subject2 - 15 8 3 1 1 1 - 1 30
Subject3 - - 4 6 5 5 4 5 1 30
Total 30 15 12 9 6 6 5 5 2
Contrast metric
The results are reported here in terms of root-mean-squared (RMS) contrast (Moulden, Kingdom, & Gatley, 1990). The RMS contrast (C) of the target in decibels (dB) was computed as follows:  where t denotes average luminance of the target patch given by  Here, M × N = 85 × 85 (1.9 × 1.9°2) denotes the dimensions of each patch, Lt(x, y) is the luminance of the target patch at location (x, y), and Lm(x, y) is the luminance of the mask patch at location (x, y). Note that the RMS contrast of the target is measured with respect to the mean luminance of the natural image mask. The average detection threshold () and the standard deviation of the thresholds (σC) were calculated from the six different runs of the three subjects via   where Cr is the threshold estimate from run r, and nR = 6 is the total number of runs. In the following sections, unless otherwise specified, the detection threshold refers to and the standard deviation refers to σC
Results
As we will demonstrate, the local thresholds in natural scenes depend on the local structure. Accordingly, in the following section, we begin by presenting the detection thresholds for each image. We then present qualitative observations of the thresholds, and finally we analyze the consistency of the thresholds across subjects and across runs. 
Local contrast detection thresholds: Masking maps
The local detection thresholds are presented in the form of maps in Figure 3. Each map consists of 36 patches corresponding to the 36 patches of the mask image. The gray-scale value of each patch in the map indicates the threshold for the corresponding patch in the mask image. 
Figure 3
 
Masking maps and the standard deviation maps. The first, seventh, 13th, and 19th rows show the mask images. The three rows below the mask images show the average maps of the two runs of Subject1, Subject2, and Subject3, respectively; the fourth rows show the average masking maps of six runs; the fifth rows show the standard deviation maps across the six estimates. The thresholds corresponding to the gray-scale values are indicated by the “Threshold Colorbar,” and the standard deviations corresponding to the gray-scale values are indicated by the “Standard-Deviation Colorbar.” The total angle subtended by each of the mask images was 11.7°.
Figure 3
 
Masking maps and the standard deviation maps. The first, seventh, 13th, and 19th rows show the mask images. The three rows below the mask images show the average maps of the two runs of Subject1, Subject2, and Subject3, respectively; the fourth rows show the average masking maps of six runs; the fifth rows show the standard deviation maps across the six estimates. The thresholds corresponding to the gray-scale values are indicated by the “Threshold Colorbar,” and the standard deviations corresponding to the gray-scale values are indicated by the “Standard-Deviation Colorbar.” The total angle subtended by each of the mask images was 11.7°.
Figure 3
 
(Figure continued from the previous page.)
Figure 3
 
(Figure continued from the previous page.)
The first, seventh, 13th, and 19th rows of Figure 3 show the 30 mask images. Below the mask images, the first, second, and third images show the average maps of the two runs of Subject1, Subject2, and Subject3, respectively. The fourth images below the mask images show the average maps of the six threshold estimates of all three subjects. The fifth images below the mask images show the standard deviations across the six estimates of all three subjects. 
Note that the thresholds corresponding to the gray scale values are indicated by the “Threshold Colorbar” shown at the left-most side of Figure 3. The standard deviations corresponding to the gray-scale values are indicated by the “Standard-Deviation Colorbar” shown at the right-most side of Figure 3
For display purposes, all of the masking maps have been globally normalized in intensity. Brighter blocks in the maps denote higher thresholds and darker blocks denote lower thresholds. Similarly, all of the standard deviation maps have been globally normalized; darker denotes lower standard deviation and brighter denotes higher standard deviation. 
Qualitative observations
Several qualitative observations can be made via a visual examination of the masking maps. Here, we present such qualitative observations in terms of: within-image threshold variations, feature variations from low to high threshold patches, and feature variations from low to high standard deviation patches. 
Within-image observations
Within a natural image, the local content can vary across space, thus giving rise to varying thresholds. Here we present general observations on such within-image threshold variations. 
Smooth regions generally give rise to lower thresholds:
In a natural image, some subject-matter, such as the sky, blurry backgrounds, and plain surfaces of objects contain smooth regions. Here, we define a “smooth” region as a region that is devoid of visible texture. This lack of texture could be due to blurriness, low contrast, a combination of blurriness and low contrast, and/or a region that is inherently devoid of texture. For example, sky is present in the images log_seaside, sunsetcolor, fisher, cactus, sunset_sparrow, family, bridge, 1600, rushmore, lady_liberty, monument, boston, aerial_city, and trolley. As expected, the masking maps of these images suggest that these sky regions generally give rise to lower thresholds compared to other regions in the mask images. 
Smooth regions can also come from blurry backgrounds. For instance, in the images swarm and turtle, the swarm of the bugs and the turtle's face are in focus. The background textures of swarm and turtle are blurred out. Observe that, in the masking maps, the blurry regions show lower thresholds. Similarly, lower thresholds appear in the far-away blurry grass area in elk, in the distant water-waves in fisher, and in the backgrounds of butter_flower, native_american, and shroom
Similarly, plain surfaces of objects can cause smooth regions in natural images. For instance, the smooth street in boston, the plain dress of the lady in woman, and the wet sea-shore of fisher show lower thresholds in the corresponding masking maps. 
Textured regions generally give rise to higher thresholds:
In comparison to the smooth regions, the textured regions generally give rise to higher thresholds. For instance, in the image elk, the furry neckline of the elk shows higher thresholds compared to other regions in the image. Similarly, in the masking maps of the image child_swimming, the area corresponding to the textured grass region shows higher thresholds in the masking maps. 
In the image geckos, the skins of the two geckos contain coarse textures, whereas the granular soil shows fine noise-like texture. In the masking maps of geckos, the fine textured soil yields higher thresholds and the coarse textured geckos' skins yield lower thresholds. Similarly, the textured flower garden in 1600, the gravel rocks in rushmore, the textured dress of the statue in lady_liberty, and the wrinkled face of the woman in native_american show higher thresholds in the corresponding masking maps. 
In the Feature-based analysis section we describe our quantitative analysis of the relationship between the low-level patch features and the thresholds. Refer to the description of the features in the Feature-based analysis section for the quantitative analysis of the threshold variations with the measures of smoothness and texture (specifically, edge density, entropy, and sharpness). 
Very low luminance gives rise to higher thresholds:
Very dark patches give rise to higher thresholds. (It is important to emphasize that thresholds were measured in terms of the RMS contrast of target with respect to the mean luminance of the natural image patch.) For instance, the dark hills in sunsetcolor, the dark background regions in roping and butter_flower, shadowed tree leaves in family show higher thresholds in the corresponding masking maps. Some images also contain small dark regions. For example, the base trunks of the trees in redwood, a dark hollow region near the top-left corner of snow_leaves, and regions along the vertical wood beam in veggies are very dark. As a result, in the masking maps of redwood, snow_leaves, and veggies, the few patches corresponding to these small dark regions show high thresholds. 
Very simple edges can give rise to lower thresholds:
By simple edges, we are referring to the edges in the images where there is a clearly visible transition in luminance. For instance, in the image sunsetcolor, the horizon line (the line where the dark hills merge with the smooth sky) is an example of a simple edge. The smooth sky regions of sunsetcolor show lower thresholds, and the dark hilly regions show higher thresholds. However, along the horizon line of sunsetcolor, thresholds are even lower than for the smooth sky region. 
Similarly, in the masking map of couple, there are some low-threshold regions. These low-threshold regions of couple correspond to the areas where there are clear transitions of luminance, such as above the man's dark hat, at the bottom region of the wooden seat on which the lady is sitting, and at the bottom line of the lady's dress. Similarly, observe the low thresholds in the image woman, along the merging line of the lady's dark bag pack and white shirt. Similar low thresholds also occur in the image geckos, along the transition line of the geckos' dark shadow and the soil. 
Overall, from these within-image observations, we summarize that smooth regions give rise to lower thresholds, textures show higher thresholds, very dark regions give rise to higher thresholds, and simple edges show lower thresholds. 
Observations on the patches when sorted according to the detection thresholds
To further gauge the relationships between thresholds and patch content, we sorted the entire set of patches in ascending order of detection thresholds. After sorting, the low, medium, and high threshold patches can be visually examined to visualize how the patch features vary as the thresholds increase. 
In our database, there are a total of 1,080 natural image patches (30 images × 36 patches per image). After removing the outliers (described in the Statistical inference tests section) we have 1,075 patches. We arranged the 1,075 patches according to the lowest to the highest average detection threshold values. Figure 4a shows the 105 patches having the lowest thresholds. Figure 4b shows the 105 patches having medium thresholds. And Figure 4c shows the 105 patches having the highest thresholds. Please see the caption of Figure 4 for the ranges of lowest, medium, and highest thresholds. 
Figure 4
 
Examples of natural image patches having the (a) lowest, (b) medium, and (c) highest detection thresholds. The ranges of the average detection thresholds shown in this Figure are (a) −59.65 dB to −46.48 dB for the lowest threshold patches, (b) −35.80 dB to −33.87 dB for medium threshold patches, and (c) −24.88 dB to −9.41 dB for the highest threshold patches.
Figure 4
 
Examples of natural image patches having the (a) lowest, (b) medium, and (c) highest detection thresholds. The ranges of the average detection thresholds shown in this Figure are (a) −59.65 dB to −46.48 dB for the lowest threshold patches, (b) −35.80 dB to −33.87 dB for medium threshold patches, and (c) −24.88 dB to −9.41 dB for the highest threshold patches.
Observe that from low to high thresholds, the visual complexity of the patches increases. Most of the lowest thresholds patches in Figure 4a are quite smooth and texture-less. Many lowest threshold patches come from the smooth sky regions and the blurry background regions of the images. Furthermore, observe that some of the lowest threshold patches contain very simple edges. 
The medium threshold patches shown in Figure 4b appear more visually complex compared to the lowest threshold patches. Several medium threshold patches contain coarse and fine textures, vertical structures, and detectable faces. Some of the medium threshold patches have blurry regions. However, the blurry medium threshold patches are not as smooth as the lowest threshold patches. 
The highest threshold patches shown in Figure 4c appear even more visually complex compared to the medium threshold patches. The textures in the highest threshold patches appear finer compared to the medium threshold patches. Observe that there are almost no blurry patches in these high threshold conditions. Furthermore, observe that some highest threshold patches contain distinguishable human or animal faces and body parts. 
Furthermore, in all the three types of patches shown in Figure 4, there are some very dark patches. However, observe that from the low to high threshold patches, the quantity of the dark patches increases. 
Overall, from low to high threshold, the natural image patches become visually complex, the amount and fineness of the textures increase, the sharpness increases, and the possibility of finding very dark patches increases. 
Observations on the patches when sorted according to the standard deviations of the thresholds
In our database, the average threshold for each patch was calculated from the thresholds of six estimates from three different subjects. Therefore, for each patch, along with the average detection threshold, we also consider the standard deviation of the average threshold. In Figure 3, along with the masking maps, the standard deviation maps are shown. Note that the colorbar for the standard deviation maps is different from the colorbar for the masking maps. 
We arranged the 1,075 natural mask patches according to the lowest to the highest standard deviations of the thresholds. Figure 5a shows the 105 patches having the lowest standard deviations. Figure 5b shows the 105 patches having medium standard deviations. Figure 5c shows the 105 patches having the highest standard deviations. See the caption for Figure 5 for the ranges of the lowest, medium, and highest standard deviations. 
Figure 5
 
Examples of natural image patches having the (a) lowest, (b) medium, and (c) highest standard deviations of the detection thresholds across subjects and runs. The ranges of the standard deviations are (a) 0.47 dB to 1.19 dB for the lowest standard deviation, (b) 2.31 dB to 2.65 dB for medium standard deviation, and (c) 4.68 dB to 9.27 dB for the highest standard deviation.
Figure 5
 
Examples of natural image patches having the (a) lowest, (b) medium, and (c) highest standard deviations of the detection thresholds across subjects and runs. The ranges of the standard deviations are (a) 0.47 dB to 1.19 dB for the lowest standard deviation, (b) 2.31 dB to 2.65 dB for medium standard deviation, and (c) 4.68 dB to 9.27 dB for the highest standard deviation.
From Figure 5, observe that from low to high standard deviations, the number of smooth and textureless patches decreases. Note that very dark patches have low to medium standard deviations. Moreover, many highest standard deviation patches shown in Figure 5c contain vertically oriented textures. 
Furthermore, from the low to high standard deviation patches, the number of identifiable structures increases. The identifiable structures include building structures, human and animal faces, fruits and vegetables, human body parts, and animal body parts. 
Overall, from the low to high standard deviation patches, the amount of texture increases, the number of dark patches decreases, the amount of vertically oriented texture increases, and the number of identifiable structures increases. 
Facilitation
The detection threshold for the unmasked condition was also measured for Subject1 with two runs. The average threshold for the unmasked condition was approximately −52 dB (−51.83 ± 1.6 dB). Thus, the masks with detection thresholds near or below −52 dB might have facilitated the detection mechanism rather than induce masking. Figure 6 shows example of patches in different threshold ranges. Notice that the patches with thresholds below or near −52 dB contain primarily simple edges. Furthermore, a simple visual inspection reveals that the orientations of these simple edges are mostly horizontal. This facilitation could be explained by cross-orientation facilitation (Meese & Holmes, 2007; Meese, Holmes, & Challinor, 2007; Meese, Summers, Holmes, & Wallis, 2007) that will be further discussed later in the Cross-orientation facilitation section. 
Figure 6
 
Example mask patches in different threshold ranges. The range of thresholds for a row of patches are shown at the left by the adjacent two numbers. For instance, the threshold range for topmost row of patches is 0 dB to −15 dB. Furthermore, for each row the threshold decreases from left to right. Note that all the ranges do not contain same number of patches. For example, in the range 0 dB to −15 dB there are only seven patches. On the other hand, in the range −25 dB to −30 dB there are total 43 patches, of which 15 example patches are shown here. The unmasked detection threshold (∼ −52 dB) is shown using the red arrow sign and the dotted line. All of the thresholds in this figure are from the two runs of Subject1.
Figure 6
 
Example mask patches in different threshold ranges. The range of thresholds for a row of patches are shown at the left by the adjacent two numbers. For instance, the threshold range for topmost row of patches is 0 dB to −15 dB. Furthermore, for each row the threshold decreases from left to right. Note that all the ranges do not contain same number of patches. For example, in the range 0 dB to −15 dB there are only seven patches. On the other hand, in the range −25 dB to −30 dB there are total 43 patches, of which 15 example patches are shown here. The unmasked detection threshold (∼ −52 dB) is shown using the red arrow sign and the dotted line. All of the thresholds in this figure are from the two runs of Subject1.
Subject consistency
Before performing the quantitative analysis, we examined the consistency of the thresholds across runs and across subjects. In this section, we discuss the consistency in terms of three types of measures: (a) correlation coefficients and root-mean-square-error across runs and across subjects, (b) standard deviation maps, and (c) statistical inference tests. 
Intra- and intersubject consistency
The consistencies were evaluated in terms of three criteria: (a) Pearson correlation coefficient (CC), which measures how well the thresholds between two runs/subjects correlate; (b) Spearman rank-order correlation coefficient (SROCC), which measures the relative monotonicity between the thresholds of two runs/subjects; and (3) root mean square error (RMSE), which measures the absolute differences of the thresholds of two runs/subjects. Note that before calculating the CC, the thresholds were transformed through a logistic transform (VQEG, 2003; Larson & Chandler, 2010) to remove any nonlinearity between the thresholds. 
Figure 7 shows the intrasubject (between runs of the same subject) and intersubject (between runs from different subjects) consistencies in terms of CC, SROCC, and RMSE. Figure 7a shows the CC and SROCC, and Figure 7b shows the RMSE. The average values of CC, SROCC, and RMSE were calculated by taking the average of the CCs, SROCCs, and RMSEs for all 30 mask images. The error bars indicate the standard deviations of the CCs, SROCCs, and RMSEs across different mask images. The intra- and intersubject consistency values for individual mask images are shown in Table 3 of Appendix A
Figure 7
 
Intrasubject and intersubject consistency in terms of (a) CC and SROCC and (b) RMSE. The average consistency values of all 30 mask images are shown. The error bars indicate the standard deviations of the CC, SROCC, and RMSE across different mask images. The intra- and intersubject consistency values for individual mask images are shown in Table 3 of Appendix A.
Figure 7
 
Intrasubject and intersubject consistency in terms of (a) CC and SROCC and (b) RMSE. The average consistency values of all 30 mask images are shown. The error bars indicate the standard deviations of the CC, SROCC, and RMSE across different mask images. The intra- and intersubject consistency values for individual mask images are shown in Table 3 of Appendix A.
The overall CC, SROCC, and RMSE were calculated by taking the thresholds from all 1,080 patches as collective datasets. For instance, to calculate the overall intrasubject consistency, all the thresholds of Run1 from all three subjects were arranged in one dataset. Similarly, all the thresholds of Run2 from all three subjects were arranged in another dataset. Then, from these two datasets the overall intrasubject CC, SROCC, and RMSE were calculated. Because the overall intrasubject values were calculated only between two datasets, there are no error bars for the overall intrasubject consistency values. 
For calculating the overall intersubject consistency, three collective datasets were created. The first dataset consisted of 1,080 average thresholds (average of Run1 and Run2) from Subject1. The second dataset consisted of 1,080 average thresholds (average of Run1 and Run2) from Subject2. The third dataset consisted of 1,080 average thresholds (average of Run1 and Run2) from Subject3. Next, we calculated the CC, SROCC, and RMSE between the first and second datasets, between the second and third datasets, and between the third and first datasets. Then, we calculated the average of the three values individually for CC, SROCC, and RMSE. Note that the error bars for the intersubject consistency values were calculated from the standard deviation across the three datasets. 
As shown in Figure 7, the average intrasubject CC and SROCC were 0.93 and 0.90, respectively. These average intrasubject CC and SROCC values were close to the maximum possible value of one, indicating that the subjects were able to reproduce their own thresholds. The average intersubject CC and SROCC were 0.92 and 0.87, respectively. These high CC and SROCC values indicate good agreement of the thresholds across subjects. The average intrasubject RMSE was 2.81 dB, which was 4.7% of the whole threshold range (0 dB to −60 dB). The average intersubject RMSE was 4.08 dB, which was 6.8% of the threshold range. Furthermore, the overall intrasubject CC, SROCC, and RMSE were 0.95, 0.95, and 2.91 dB, respectively. The overall intersubject CC, SROCC, and RMSE were 0.92, 0.92, and 4.20 dB, respectively. The overall consistency values were close to the average consistency values. 
Standard deviation maps
The extent of agreements across subjects and runs can also be observed via the standard deviation maps shown in Figure 3. In the standard deviation maps, darker patches indicate smaller standard deviations, and brighter patches indicate higher standard deviations. Observe that most of the patches show low to medium standard deviations. Very few patches show high standard deviations. For instance, for the image log_seaside, only two patches show high standard deviations (observe the first to sixth rows of the first column in Figure 3). One patch is located at the center of the image, and another patch is located at the right-most side of the image. For those two patches of log_seaside, the thresholds may vary in the range: −30 ± 9 dB. Overall, the maximum deviation was 9 dB, which was 15% of the threshold range (0 to −60 dB). 
Figure 8a shows the histogram of the thresholds. Observe that few patches had very high (above −15 dB) or very low (below −50 dB) thresholds. Figure 8b shows the histogram of the standard deviations of the thresholds. The average deviation was approximately 2 dB, which was roughly 3.3% of the threshold range. Furthermore, observe that few patches had standard deviations above 6 dB, suggesting that for most of the patches, there were good agreements across subjects and runs in terms of standard deviations. 
Figure 8
 
Histograms of (a) detection thresholds and (b) standard deviations. In the legend of (a), μCT denotes the average of all thresholds, and μCT ± CT denotes the standard deviations of the thresholds (σCT) added with or subtracted from μCT; n = −3, −2, −1, 1, 2, 3. In the legend of (b), μCTσ denotes the average of all standard deviations, and μCTσ ± nσCTσ denotes the standard deviations of the standard deviations of thresholds (σCTσ) added with or subtracted from μCTσ; n = −1, 1, 2, 3.
Figure 8
 
Histograms of (a) detection thresholds and (b) standard deviations. In the legend of (a), μCT denotes the average of all thresholds, and μCT ± CT denotes the standard deviations of the thresholds (σCT) added with or subtracted from μCT; n = −3, −2, −1, 1, 2, 3. In the legend of (b), μCTσ denotes the average of all standard deviations, and μCTσ ± nσCTσ denotes the standard deviations of the standard deviations of thresholds (σCTσ) added with or subtracted from μCTσ; n = −1, 1, 2, 3.
Statistical inference tests
As an additional consistency check, we performed statistical inference tests on the thresholds across runs and across subjects. 
Outlier patches:
Before performing the statistical tests, we removed the outliers by drawing the box and whisker plots as shown in Figure 9. The horizontal axis of Figure 9 shows the range of thresholds. The vertical axis describes different subjects and runs. We chose the maximum whisker length of w = 1.5 (Tukey, 1977). Thus, five patches among 1,080 patches became outliers. All the outlier patches were very dark, and had thresholds above −4.3 dB. 
Figure 9
 
Box and whisker plots of the detection thresholds. The horizontal axis shows the range of thresholds. The vertical axis describes different subjects and runs. The circles denote the outliers.
Figure 9
 
Box and whisker plots of the detection thresholds. The horizontal axis shows the range of thresholds. The vertical axis describes different subjects and runs. The circles denote the outliers.
The box and whisker plots shown in Figure 9 describe rough distributions of the thresholds from different runs and different subjects. All of the thresholds of Subject1 came from subject MA, but, more than one subject served as Subject2 and Subject3 (see Table 1 for the distribution of the subjects as Subject1, Subject2, and Subject3). However, observe that all of the medians and most parts of the interquartile ranges of all the box and whisker plots fall in the range of −40 dB to −30 dB, which indicates the threshold distributions from different runs and subjects were reasonably similar. 
Statistical tests:
To quantify the variability of the thresholds due to the variability of the patches, variability of the subjects, or variability of the runs, we performed image-by-image analysis of variance (ANOVA) by using the Generalized Linear Model procedure in the SAS 9.3 software package (Littell, 2006) with a 95% confidence level. Table 4 of Appendix B shows the results of the statistical tests. 
When the patch was used as the independent variable, only 3 out of 30 mask images had p values greater than or close to 0.05, suggesting that, for most of the mask images, the average threshold of the image varied with the variation of patches. When the subject was used as the independent variable, 7 out of 30 mask images had p values greater than or close to 0.05, suggesting that for most of the mask images, the average threshold of the image varied with the variation of subjects. On the other hand, when the run was used as the independent variable, 18 out of 30 mask images had p values greater than or close to 0.05, suggesting that for most of the mask images, the average threshold of the image remained unchanged across runs. In addition, for the subject-run interaction, 12 out of 30 mask images had p values greater than or close to 0.05, suggesting that for many images, the average threshold of the image remained unchanged across subjects and across runs. 
Thus, from the statistical tests, we found that the subjects were quite consistent in reproducing their own thresholds. Furthermore, the subject-run interactions over the thresholds were reasonably significant. 
Analysis
In this section, we present a quantitative analysis of the results. We first analyze the relationships between low-level patch features and the thresholds. Next, we analyze the performance of a contrast gain control model in predicting the thresholds. 
Feature-based analysis
We examined the extent to which the thresholds correlate with basic low-level mask features. We considered the following features: average luminance, Michelson contrast, RMS contrast, standard deviation, skewness, kurtosis, edge density, sharpness, entropy, local entropy, slope of the magnitude spectrum, intercept of the magnitude spectrum, proportion of the mask patch's energy around the radial spatial frequency close to the target's center radial spatial frequency, and proportion of the mask patch's energy around the orientation close to the target's orientation. 
In extremely low-luminance conditions, visual sensitivity can be reduced due to spontaneous neural activity and other sources of internal noise (Frazor & Geisler, 2006). Thus, for the feature analysis, we chose the 988 natural image patches that had average luminances greater than 3 cd/m2
We quantified the association of the individual patch features with the thresholds in terms of two criteria: Pearson correlation coefficient (CC) and Spearman rank-order correlation coefficient (SROCC). Figure 10a shows the CC and SROCC values for the different patch features. Before calculating the CC the feature values were transformed through a logistic transform (VQEG, 2003; Larson & Chandler, 2010) to remove any nonlinearity between the thresholds and the features. 
Figure 10
 
CC (correlation coefficient) and SROCC (Spearman rank-order correlation coefficient) between patch features and thresholds. CC measures how well the patch features correlate with the thresholds, and SROCC measures the relative monotonicity between the patch features and the thresholds. (a) CC and SROCC of patch features without dividing the patches into groups. The features were calculated for the patches with average luminances greater than 3 cd/m2. (b) SROCC of the features for low-, medium-, and high-threshold patches. For each group 200 patches were selected. The threshold ranges were: low threshold: −59.87 to −43.91 dB; medium threshold: −40.84 to −30.78 dB; and high threshold: −28.38 dB to −13.69 dB.
Figure 10
 
CC (correlation coefficient) and SROCC (Spearman rank-order correlation coefficient) between patch features and thresholds. CC measures how well the patch features correlate with the thresholds, and SROCC measures the relative monotonicity between the patch features and the thresholds. (a) CC and SROCC of patch features without dividing the patches into groups. The features were calculated for the patches with average luminances greater than 3 cd/m2. (b) SROCC of the features for low-, medium-, and high-threshold patches. For each group 200 patches were selected. The threshold ranges were: low threshold: −59.87 to −43.91 dB; medium threshold: −40.84 to −30.78 dB; and high threshold: −28.38 dB to −13.69 dB.
Furthermore, we investigated if the low-level features better correlate with any specific range of the thresholds. To perform this analysis, we selected 200 patches with low thresholds, 200 patches with medium thresholds, and 200 patches with high thresholds. Figure 10b shows the SROCC values between various low-level features and different ranges of thresholds. Although we could divide the whole threshold range (0 to −60 dB) into more than three ranges, for simplicity we examined patches only from these three ranges. 
Luminance:
From the qualitative observations we found that very dark patches yielded high thresholds (see Figure 4). For the natural image patches, we calculated the average luminance by using Equation 8 (replacing the subscript t with m in Equation 8). From Figure 10a, observe the CC and SROCC values of 0.3 and 0.28, respectively, for the average luminance feature. From Figure 10b notice that the SROCC values for the low, medium, and high threshold patches do not vary for the average luminance feature. These low values of the correlation coefficients indicate a poor relationship between average mask luminance and thresholds. 
Contrast:
With the exception of possible facilitation, generally, thresholds increase with the increasing mask contrast (Legge & Foley, 1980). To observe the dependency of our thresholds on the mask contrast, we used two well-known contrast measures: Michelson contrast and RMS contrast. Michelson contrast (CM) was calculated via  where Lmax and Lmin are respectively the maximum and minimum luminance of a mask patch. The RMS contrast was calculated using Equation 7 (replacing the subscript t with m in Equation 7). 
From Figure 10a, observe that both Michelson contrast and RMS contrast better correlate with the thresholds compared to the average luminance. However, as shown in Figure 10b, Michelson contrast better correlates with the lower thresholds, whereas RMS contrast better correlates with the medium thresholds. However, overall low correlation coefficients (0.5 and 0.52 for Michelson and RMS contrast, respectively) indicate a weak relationship between contrast features and detection thresholds. 
Scatter plots of thresholds versus Michelson and RMS contrasts are shown in Figure 11a and Figure 11b, respectively. From the scatter plots, notice that RMS contrast appears to be better correlated with the thresholds compared to Michelson contrast. One might notice many of the patches have a high Michelson contrast (near 1.0). As mentioned in the Methods section, the images from which the patches have been drawn were normalized to the pixel intensity range of 0 to 255. 
Figure 11
 
Scatter plots of detection thresholds versus patch features. (a) Michelson contrast versus thresholds, (b) RMS contrast versus thresholds, (c) edge density versus thresholds, (d) intercept of magnitude spectrum versus thresholds, (e) sharpness versus thresholds, and (f) band energy versus thresholds. Descriptions of these features are given in Feature-based analysis. The scatter plots shown here include thresholds and features of all 1,080 natural image patches of our database.
Figure 11
 
Scatter plots of detection thresholds versus patch features. (a) Michelson contrast versus thresholds, (b) RMS contrast versus thresholds, (c) edge density versus thresholds, (d) intercept of magnitude spectrum versus thresholds, (e) sharpness versus thresholds, and (f) band energy versus thresholds. Descriptions of these features are given in Feature-based analysis. The scatter plots shown here include thresholds and features of all 1,080 natural image patches of our database.
Common statistical features:
We also investigated some common statistical features of the masks: standard deviation (σm), skewness (sm), and kurtosis (κm). These statistical features were calculated via    where Lm(x, y) is the luminance of the mask pixel at location (x, y), m is the average luminance of the mask patch, and the dimensions of each patch is M × N = 85 × 85. 
From Figure 10a notice that among the three statistical features, skewness shows the best correlation, whereas kurtosis shows the worst correlation with the thresholds. None of these three statistical features shows better correlation with the thresholds compared to the contrast features. Furthermore, Figure 10b suggests that standard deviation and skewness better correlate with the medium and high thresholds, whereas kurtosis better correlates with the low thresholds. 
Edge density:
In natural images, edges can occur from object boundaries, from surface variations of objects, from shadows, or from reflectance variations. Some edges, where a difference of luminance is visible, are easy to detect by computational edge-detectors. We detected the edges of our natural image masks by using the Canny edge detector (Canny, 1986) with a standard deviation of the Gaussian filter set to 4.5, and with the low and high thresholds of edge detection set to 0.08 and 0.2, respectively. Using the Canny edge detector, a binary edge map (BEdge) of the mask patch was created, and then the edge density (DEdge) was calculated by  where the dimensions of the patch M × N = 85 × 85. From Figure 10a, observe that the mask edge density correlates with the thresholds to the same extent as the mask contrast. Figure 10b suggests that patch edge-density correlates with the low, medium, and high threshold patches to similar extents. Furthermore, the scatter plot shown in Figure 10c demonstrates a larger spread compared to the RMS contrast and sharpness measures (shown in Figure 11b and 11e, respectively). 
Entropy:
From the qualitative observations we noticed that from low to high thresholds, patches became visually more complex. To attempt to partially quantify this visual complexity, we estimated first-order entropy (Em) (Gonzalez, Woods, & Eddins, 2009) of each mask patch by using the histogram of luminance values via  where Pbin is the number of counts in that bin divided by the total counts. We used a total of 256 bins in the histogram. We also calculated the local entropy. An entropy map was first generated by calculating the point-wise first-order entropy around a 7 × 7 neighborhood of the mask patch (Gonzalez et al., 2009). Then, the local entropy was calculated by averaging the mask entropy map. 
According to the CCs and SROCCs of Figure 10, both the entropy and local entropy correlate with the thresholds almost equally. Furthermore, the correlation coefficients are not noteworthy. In Discussion, we explore entropy masking (Watson et al., 1997) and possible improvements in the entropy measures to predict the thresholds. 
Magnitude spectra:
It is well known that the magnitude spectra M(f) of natural images fall inversely with the spatial frequency, i.e., M(f) ∝ fα, where f is the radial frequency, and –α is the slope of the line log(M) ∝ –α log(f), where α is approximately −1 for natural images (Field, 1987). To compute the slope of the magnitude spectrum for each mask patch, the 2-D DFT of the mask patch was computed. To remove edge effects, before computing the DFT, each mask patch was multiplied with a Kaiser window of size 85 × 85 and side-lobe attenuation of −14.7 dB. Let yL(f, θ) denote the 2-D DFT of the windowed mask, where f is the radial frequency and θ is the orientation. The magnitude spectrum zL(f) was computed by summing across all orientations via   
The slope (αL) and the amplitude intercept (log βL) of the magnitude spectrum were calculated as the slope of the line –α log f + log β, which best fitted the total magnitude of zL(f). Specifically, the best-fitting line was computed via the following optimization:  where the L2 norm was taken over all radial frequencies f > 0. 
From Figure 10a, notice that the intercepts of the magnitude spectra better correlates with the thresholds compared to the slopes of the spectra. Figure 10b suggests that the slopes of the magnitude spectra better correlates with the high thresholds, and the intercepts of the magnitude spectra better correlates with the medium thresholds. 
Sharpness:
From the qualitative observations on our masking maps, we noticed that the blurry image regions showed lower thresholds, and sharp regions showed relatively higher thresholds. It is well known that the slope of the magnitude spectra is related to the perceived sharpness of images, and the intercept of the magnitude spectra is related to the image contrast. It is well known that contrast also affects the perceived sharpness of an image (Webster & Miyahara, 1997; Vu, Phan, & Chandler, 2012). For instance, a low contrast image-region may also be perceptually of low sharpness, and vice versa. Thus, any perceived sharpness measure would likely be confounded by the contrast measures. 
We measured the perceived sharpness of each mask patch by using the S3 sharpness measure (Vu et al., 2012). Given a mask image, the S3 measure generates a sharpness map. To compute the correlation between the mask sharpness and the thresholds, we first generated the sharpness map for each of the full-sized 30 natural images. Next, we divided each sharpness map into 36 patches and subsequently calculated the average sharpness of each patch. 
From Figure 10a, notice that sharpness shows the highest correlation (CC of 0.7 and SROCC of 0.69) with the thresholds among all the low-level mask features described here. Furthermore, from Figure 10b, observe that sharpness better correlates with the low thresholds. Figure 11e shows the scatter plot between thresholds and perceived sharpness of the patches. From the scatter plots, notice that sharpness appears to be better correlated with the thresholds compared to the contrast measures. 
Band and orientation energy:
Stromeyer and Julesz's (1972) showed that the masking of a sinusoidal grating increased as the mask-noise bandwidth increased up to ±1 octave beyond the grating frequency. Hence, the thresholds can be affected by the critical bands (Stromeyer & Julesz, 1972) around the target's center radial frequency. Our target was a vertically oriented log-Gabor noise target with center radial frequency of 3.7 c/°. Here we determined whether the thresholds were affected by the energy of the mask in neighboring radial frequency and orientation bands. 
To calculate the energy content of the mask patch in neighboring frequency bands and orientations, we decomposed each of our full-sized natural images into six frequency bands and six orientations using log-Gabor filters. Then, we divided each of the results into total of 36 patches of size 85 × 85. Let GL(nf, nθ) denote the log-Gabor filter output of a patch, where nf corresponds to the frequency band index, and nθ corresponds to the orientation channel index. From nf = 1 to 6 there were six frequency bands with center radial frequencies 0.83, 1.52, 2.83, 5.31, 10.01, and 16.62 c/°. From nθ = 1 to 6 there were six orientations π/2, 2π/3, 5π/6, 0, π/6, and π/3. Next, the energy of each band was calculated by taking the L2 norm of that band. The energy of the band GL(nf, nθ) is given by,   
Because our target's center radial frequency was 3.7 c/°, we computed the energy content (Ef) of the mask patch around the target's frequency by summing the energy at the third and fourth frequency bands (having center radial frequencies of 2.83 and 5.31 c/°, respectively) and dividing the sum by the total energy of all six radial frequency bands via   
Furthermore, our log-Gabor noise target had a vertical orientation. Therefore, we computed the energy content (Eθ) of the mask patch around the vertical orientation by summing up the energy at the first orientation band (tuned around vertical orientation) and dividing the sum by the total energy of the six orientation bands via   
Figure 10a suggests that Ef better correlates with the thresholds compared to Eθ. Furthermore, Figure 10b suggests that Ef better correlates with the medium thresholds, and Eθ better correlates with the high thresholds. Furthermore, the scatter plot shown in Figure 11f shows a large spread for the band energy compared to the RMS contrast and sharpness measures (shown in Figures 11b and e, respectively). 
In summary, some low-level mask features (sharpness, intercept of the magnitude spectrum, and contrast) showed better relationships (correlation coefficients above 0.5) with the thresholds. Nonetheless, individually, none of the mask features correlated with the thresholds suitably. It might be possible that linear or nonlinear regression with multiple features would predict the thresholds better than the individual features. We discuss possible approaches for feature-based modeling to predict the thresholds in the Discussion
Contrast gain control model: Analysis
Numerous models of masking have been proposed in previous studies (e.g., Nachmias & Sansbury, 1974; Legge & Foley, 1980; S. J. Daly, 1993; Wilson & Humanski, 1993; Foley, 1994; Teo & Heeger, 1994; Lubin, 1995; Watson & Solomon, 1997). Here, as a final analysis to predict our thresholds, we used a modern computational neural-model of masking by Watson and Solomon (1997). 
Model overview
We implemented the computational model of masking proposed by Watson and Solomon (1997) with some modifications. In the computational model, both the mask and mask+target go through the same stages: contrast sensitivity filter (CSF), log-Gabor neural-array, excitatory nonlinearity, inhibitory nonlinearity, pooling, and division. After division, the subtracted responses of mask and mask+target are pooled via Minkowski pooling. By using an iterative process, the target contrast is changed until the Minkowski pooled response difference equals a predefined “at threshold” difference, at which point the final threshold is calculated. 
Let z(x0, y0, f0, θ0) denote the initial linear modeled neural response (log-Gabor filter output) at location (x0, y0), center radial frequency f0, and orientation θ0. In the model, the response of a neuron tuned to these parameters is given by  where g is the output gain (Chandler et al., 2009), p provides the point-wise excitatory nonlinearity, q provides the point-wise inhibitory nonlinearity to the neurons in the inhibitory pool, S indicates the set of neurons that are included in the inhibitory pool, and b represents the semisaturation constant. In the following paragraphs, we describe the stages of the model and our choices of parameters for those stages. 
Contrast sensitivity filter:
The CSF used in the model was originally described by Mannos and Sakrison (1974) with adjustments specified by Daly (1987) (see also Larson & Chandler, 2010). 
Log-Gabor neural-array:
The log-Gabor filter specifics were: number of frequency bands, nf = 8; number of orientation channels, nθ = 6; filter bandwidth, BWf = 1.5 octaves; center radial frequencies of the eight frequency bands: 0.44, 0.74, 1.22, 2.09, 3.57, 6.22, 10.84, and 16.84 c/°. The frequency-band spacings and the orientation channel spacings were chosen such that in the frequency domain there was even coverage in the sum of the squared filter amplitudes. 
Excitatory and inhibitory nonlinearity:
Several studies used different ranges of values for the excitatory nonlinearity parameter (p) and the inhibitory nonlinearity parameter (q) (e.g., Foley, 1994; Teo & Heeger, 1994; Watson & Solomon, 1997; Chandler et al., 2009). We used different combinations of p and q within the range of 2 ≤ qp ≤ 2.4. 
Inhibitory pooling:
Following previous studies (Watson & Solomon, 1997; Chandler et al., 2009), we adopted complete summation over phase. Furthermore, the inhibitory pool consisted of those neural responses within ± 60° of the orientation of the responding neuron, within ± 0.7 octaves bandwidth from the responding neuron's radial spatial frequency, and within eight connected neighbors in space. The pooling widths in space, spatial frequency, and orientation were chosen by measuring the effects of pooling in each dimension on the model performance (see Effects of the suppressive gain pool). For spatial pooling we used convolution with a 3 × 3 Gaussian created using the outer product of [0.1065 0.787 0.1065]; for spatial frequency and orientation pooling we used convolutions with equal weights. 
Divisive gain control:
The excitatory output was divided by the inhibitory outputs (Heeger, 1992; Foley, 1994; Teo & Heeger, 1994; Watson & Solomon, 1997; Chandler et al., 2009), as shown by Equation 22. Note that the output gain g operates after the division operation and appears both in the divisive responses for mask and mask+target. Thus, the value of g determines only the global threshold variation in the masking map, i.e., changing the g changes all of the thresholds in a masking map in concert. We used an output gain of g = 0.1 (Chandler et al., 2009). The semisaturation constant b used in the model was in the range of 0.02 to 0.08 (Watson & Solomon, 1997; Chandler et al., 2009). 
Decision:
After subtracting the responses of mask and mask+target, the responses were (Minkowski) pooled in space, frequency, and orientations via  where βθ = βf = 1.5, and βr = 2 (Chandler & Hemami, 2003; Chandler et al., 2009). Via a bisection search, the model searches for the target contrast that gives the decision value d close to a predefined constant value. We chose the commonly used value of d = 1 (Watson & Solomon, 1997; M. A. Masry & Hemami, 2004; Chandler et al., 2009). All of the parameters used in our implementation of the model are given in Table 2
Table 2
 
Parameters of the model. The first column shows the symbols of the parameters. The second column shows the descriptions of the parameters. The third column shows the ranges of values of the parameters that we used in our implementation of the model.
Table 2
 
Parameters of the model. The first column shows the symbols of the parameters. The second column shows the descriptions of the parameters. The third column shows the ranges of values of the parameters that we used in our implementation of the model.
Symbol Description Value
Contrast sensitivity filter
fCSF CSF peak frequency 6.0 c/°
BWCSF CSF log10 bandwidth 1.43
Log-Gabor neural array
nf Number of frequency bands 8
BWf Bandwidth of the frequency bands 1.5 octaves
f0G Center radial frequencies of the bands 0.44, 0.74, 1.22, 2.09
3.57, 6.22, 10.84, 16.84 c/°
nθ Number of orientation channels 6
BWθ Bandwidth of the orientation channels 30°
θ0G Center angles of the orientation channels 0°, ±30°, ±60°, 90°
Divisive gain control
p Excitatory exponent 2.0, 2.3, 2.4
q Inhibitory exponent 2.0, 2.32
b Semi-saturation constant 0.02, 0.035, 0.05, 0.08
g Output gain 0.1
sx,y Pooling kernel in space 3 × 3 neighborhood with Gaussian (standard deviation 0.5, around 0.03°)
sf Pooling kernel in frequency Within ±0.7 octave bandwidth with equal weights
sθ Pooling kernel in orientation Within ±60° bandwidth with equal weights
βr Minkowski exponent for space 2.0
βf Minkowski exponent for frequency 1.5
βθ Minkowski exponent for orientation 1.5
d Constant decision 1.0
Performance of the model
To test the performance of the model, we used four sets of fixed parameters. We denote the sets as Set1, Set2, Set3, and Set4. Set1 parameters were chosen from the estimated parameters of Watson and Solomon's (1997) observer KMF (Foley, 1994): p = 2.3, q = 2.0, b = 0.02, g = 0.1. Set2 parameters were chosen from the estimated parameters of Watson and Solomon's (1997) observer JYS (Foley, 1994): p = 2.3, q = 2.0, b = 0.08, g = 0.1. Set3 parameters were chosen from Teo and Heeger's (1994) model: p = 2.0, q = 2.0, b = 0.05, g = 0.1. Set4 parameters were chosen from the estimated parameters of Chandler et al.'s (2009) texture-class patches: p = 2.4, q = 2.32, b = 0.035, g = 0.1 (originally, g = 0.02). Note that as the output gain, g controls only the convergence speed and global threshold variations; we kept the same g = 0.1 in all four sets. 
Qualitative observations on the model masking maps:
Figure 12 shows the masking maps generated using the four sets of fixed parameters. Observe that the fixed-parameter model masking maps are quite similar to the ground-truth masking maps. For example, the ground-truth masking map of sunsetcolor shows lower thresholds in the smooth sky region and higher thresholds in the dark hillside. In all the model masking maps of sunsetcolor, also observe the lower thresholds in the smooth sky region and higher thresholds in the dark hill region. Likewise, for the image turtle, observe the low thresholds in the blurry background and high thresholds in the turtle's face, both in the ground-truth and the model masking maps. Similarly, for other images, the model maps reasonably match with the ground-truth maps. 
Figure 12
 
Model masking maps with four sets of parameters. Set1: Watson and Solomon's (1997) observer KMF parameters (Foley, 1994): p = 2.3, q = 2.0, b = 0.02. Set2: Watson and Solomon's (1997) observer JYS parameters (Foley, 1994): p = 2.3, q = 2.0, b = 0.08. Set3: Teo and Heeger's (1994) model parameters: p = 2.0, q = 2.0, b = 0.05. Set4: Chandler et al.'s (2009) texture-class parameters: p = 2.4, q = 2.32, b = 0.035. The ground-truth masking maps are shown after the mask images. For display purposes, the thresholds above 0 dB were set to 0 dB.
Figure 12
 
Model masking maps with four sets of parameters. Set1: Watson and Solomon's (1997) observer KMF parameters (Foley, 1994): p = 2.3, q = 2.0, b = 0.02. Set2: Watson and Solomon's (1997) observer JYS parameters (Foley, 1994): p = 2.3, q = 2.0, b = 0.08. Set3: Teo and Heeger's (1994) model parameters: p = 2.0, q = 2.0, b = 0.05. Set4: Chandler et al.'s (2009) texture-class parameters: p = 2.4, q = 2.32, b = 0.035. The ground-truth masking maps are shown after the mask images. For display purposes, the thresholds above 0 dB were set to 0 dB.
However, for some images, the model masking maps differ from the ground-truth maps. For instance, the model masking maps of geckos do not show the lower thresholds in the top gecko's skin. Similarly, the model masking maps of aerial_city do not show the lower thresholds in the regions near the bottom sea. Likewise, in the ground-truth masking maps of foxy and child_swimming, observe the lower thresholds near the faces of the fox and the child. But, the model masking maps fail to show the lower thresholds in these regions. Further discussion about the model performance is given in the Effects of contrast gain control section. 
Quantitative observations on the model thresholds:
Along with the qualitative observations on the model masking maps, we measured quantitative similarity between the thresholds and the model predictions using CC, SROCC, and RMSE. We calculated the CC, SROCC, and RMSE individually for all 30 mask images (see Table 5 of Appendix C). Figure 13 shows the average CC, SROCC, and RMSE of all mask images along with the standard deviations across the mask images. 
Figure 13
 
(a) CC, SROCC, and (b) RMSE between the thresholds and model predictions with four sets of fixed parameters. The CC, SROCC, and RMSE shown here are the average CC, SROCC, and RMSE of all mask images. The error bars are given in terms of standard deviations. The CC, SROCC, and RMSE values for individual mask images are provided in Table 5 of Appendix C.
Figure 13
 
(a) CC, SROCC, and (b) RMSE between the thresholds and model predictions with four sets of fixed parameters. The CC, SROCC, and RMSE shown here are the average CC, SROCC, and RMSE of all mask images. The error bars are given in terms of standard deviations. The CC, SROCC, and RMSE values for individual mask images are provided in Table 5 of Appendix C.
The CC and SROCC values shown in Figure 13a suggest that the model with the four sets of fixed parameters performed quite well in predicting the thresholds. In terms of CC and SROCC the model using Set3 and Set4 parameters performed better than the other sets. However, in terms of RMSE (Figure 13b), the model with Watson and Solomon's observer KMF parameters (Set1) performed better. However, the differences in CC, SROCC, and RMSE among the four sets were very subtle. 
Discussion
In this study, we examined the detectability of a log-Gabor noise target presented on local patches of various natural-image masks. The objectives of this study were twofold: (a) to investigate how local detection thresholds are affected by natural masks, and if and how these thresholds differ compared to those measured using unnatural masks; and (b) to create a large database of such thresholds to foster future studies on visual masking in natural images. In addition to qualitative analyses, we examined the relations between various low-level mask features and the thresholds. We also examined the performance of a modern computational masking model in predicting the thresholds. 
Single or multiple forms of masking?
Numerous studies on visual masking using unnatural masks have shown that thresholds can be affected by controlled, often low-level mask properties. For example, thresholds have been shown to be affected by changes in contrast (e.g., Legge & Foley, 1980; Watson & Solomon, 1997), changes in luminance (e.g., Peli, 1990; Watson, 1993a; Eckert & Bradley, 1998), changes in pattern or noise properties (e.g., Ahumada, 1967; Legge, Kersten, & Burgess, 1987; Delord, 1998; Nadenau & Reichel, 2000; Winkler & Susstrünk, 2004), and changes in subject familiarity (Watson et al., 1997). 
From the Feature-based analysis, our results suggest that each of these types of masking also occur, at least in part, for the natural masks used in this study. For example, both RMS and Michelson contrasts showed CC values of approximately 0.5 (see Figure 10). Even though average luminance had a lower CC value, we did observe that very dark patches gave rise to very high thresholds (see Figure 4). For entropy masking and noise or pattern masking, although it is difficult to link these effects to measurable features, as very rough estimates, the first-order entropy, band energy, and band orientation showed CC values of 0.41, 0.48, and 0.31, respectively. 
However, no feature alone did remarkably well at predicting the thresholds, suggesting that these various types of masking may be occurring simultaneously, or may be influencing each other. For example, although sharpness yielded the highest correlation with the thresholds, the sharpness measure employed here (Vu et al., 2012) actually uses a combination of RMS contrast and the rotationally averaged slope of the magnitude spectrum (see Figure 10). Similarly, a very basic weighted combination of some features (average luminance, RMS contrast, entropy, band energy, and orientation energy) better correlated with the thresholds (CC of 0.65) than did the individual features. These data suggest that masking in natural images may be dictated by multiple mechanisms or even possibly by a new, unknown mechanism. 
Effects of contrast gain control
In the Performance of the model section, to predict our thresholds we used Watson and Solomon's (1997) contrast gain control model with some modifications (see Figure 12). Considering that the model was never tested on natural-image masks, its performance was quite remarkable. Nonetheless, for some image regions, the model was unable to accurately predict the thresholds. Figure 14 shows examples of the masks for which the model's predicted thresholds did not correlate well with the actual thresholds. Observe that for the image geckos, the model could not predict the lower thresholds at the top gecko's skin. Similarly, the model could not predict the lower threshold near the child's and fox's faces in the images child_swimming and foxy, respectively. Likewise, the model generally struggled on many regions in the images couple and boston. Table 5 in Appendix C shows the CCs and SROCCs between the model predictions and the thresholds for the entire database. Observe from that table that for some images (lake, native_american, couple, and aerial_city), even the best individual CCs and SROCCs were less than 0.7. 
Figure 14
 
Cases where the masking-model performed poorly. The first row shows the mask images, the second row shows the ground-truth masking maps, and the third row shows the model masking maps. Model maps are drawn using the Set1 parameters described in the Performance of the model section.
Figure 14
 
Cases where the masking-model performed poorly. The first row shows the mask images, the second row shows the ground-truth masking maps, and the third row shows the model masking maps. Model maps are drawn using the Set1 parameters described in the Performance of the model section.
Again, the fact that the model performed as well as it did is quite noteworthy. However, we also believe that further insights into masking may be gained by studying the images and image patches for which the model, and thus a gain-control-based approach, failed to predict the thresholds. To observe these cases, we sorted the patches according to the absolute differences between the model's predictions and the actual thresholds. Figure 15a shows the patches for which the absolute differences were the smallest (best predictions), and Figure 15b shows the patches for the absolute differences were the largest (worst predictions). Note that because of the reduced visual sensitivity in very dark regions (Frazor & Geisler, 2006), in Figure 15, only the patches with an average luminance greater than 3 cd/m2 are shown. 
Figure 15
 
Patches were sorted according to the absolute differences between the model predictions and the thresholds. (a) Shows the example of patches where the absolute differences were the smallest. (b) Shows the example of patches where the absolute differences were the largest. Here, only the patches with average luminance greater than 3 cd/m2 are shown. Model predictions were calculated using the Set1 parameters described in the Performance of the model section.
Figure 15
 
Patches were sorted according to the absolute differences between the model predictions and the thresholds. (a) Shows the example of patches where the absolute differences were the smallest. (b) Shows the example of patches where the absolute differences were the largest. Here, only the patches with average luminance greater than 3 cd/m2 are shown. Model predictions were calculated using the Set1 parameters described in the Performance of the model section.
The differences between the patches shown in Figure 15a and Figure 15b are subtle. However, observe that the worst-predicted patches (Figure 15b) contain a considerable number of identifiable structures, such as monuments, human and animal faces, and body parts. In addition, observe that for the best-predicted patches (Figure 15a), there are more blank patches than in Figure 15b. Although there are some blank patches in Figure 15b, these blank patches contain identifiable structures when viewed with their context (as used in the experiment). In Figure 16, some of these worst-predicted patches are shown with their contexts. Observe that these latter patches contain some identifiable structures either in the central region of the patch or in the context. 
Figure 16
 
Example patches where the model predictions were poor. The patches are shown with contexts. From top to bottom the absolute error between the detection thresholds and model predictions decreases. The absolute error ranges are, top row: 12.6 dB to 10.8 dB, second row from the top: 10.7 dB to 9.1 dB, third row from the top: 8.7 dB to 8.2 dB, and bottom row: 8.1 dB to 7.8 dB. Furthermore, in each row from left to right the absolute error decreases. Model predictions were calculated using the Set1 parameters described in the Performance of the model section.
Figure 16
 
Example patches where the model predictions were poor. The patches are shown with contexts. From top to bottom the absolute error between the detection thresholds and model predictions decreases. The absolute error ranges are, top row: 12.6 dB to 10.8 dB, second row from the top: 10.7 dB to 9.1 dB, third row from the top: 8.7 dB to 8.2 dB, and bottom row: 8.1 dB to 7.8 dB. Furthermore, in each row from left to right the absolute error decreases. Model predictions were calculated using the Set1 parameters described in the Performance of the model section.
It is also important to note that, although we used Watson and Solomon's (1997) gain control model with some modifications, there are several other potential approaches for modeling the thresholds (e.g., Legge & Foley, 1980; S. J. Daly, 1993; Wilson & Humanski, 1993; Foley, 1994; Teo & Heeger, 1994; Lubin, 1995; S. J. Daly et al., 2000; Nadenau & Reichel, 2000; Nadenau et al., 2002). We are currently in the process of researching the successes and failures of these various models. 
Effects of the suppressive gain pool
One aspect of gain control that has received recent attention is the nature of suppression from the pool of nearby neurons. Researchers have reported different pooling mechanisms by fitting gain control models to predict their thresholds (e.g., Legge & Foley, 1980; Heeger, 1992; Foley, 1994; Watson & Solomon, 1997; Truchard, Ohzawa, & Freeman, 2000; Meese & Holmes, 2002; Holmes & Meese, 2004; Meese & Holmes, 2007; Laparra, Muñoz-Marí, & Malo, 2010; Malo & Laparra, 2010). Generally, in contrast gain control models, the neural responses are pooled over space, spatial frequency, and orientation. For example, the contrast gain control model of Watson and Solomon (1997) pools responses over space and orientation, but not over spatial frequency. Some of the recent studies have demonstrated the effects of these pooling mechanisms and have implied how the pooling should be modeled over space, spatial frequency, and orientation (e.g., Meese & Holmes, 2007; Laparra et al., 2010). 
We could have fitted the contrast gain control model to predict our thresholds, which would have given the optimum pooling widths in the space, frequency, and orientation dimensions. However, here we measured the degree to which the prediction performance changed while pooling widths were varied in each dimension. Figure 17 shows the model performance (in terms of RMSE) by varying (a) space pooling width, (b) spatial frequency pooling width, and (c) orientation pooling width. Please refer to the caption of Figure 17 for the details of the implementations. 
Figure 17
 
Effects of gain control pooling width. The RMSEs between the thresholds and model predictions were calculated by varying (a) space pooling width (in degrees of subtended angle), (b) spatial frequency pooling width (octave), and (c) orientation pooling width (in degrees). While varying only the space pooling width (for [a]), the frequency and orientation pooling widths were 0:7 octave and 60°, respectively. While varying the frequency pooling width (for [b]), the space and orientation pooling widths were approximately 0:03° and 60°, respectively. While varying orientation pooling width (for [c]), the space and frequency pooling widths were approximately 0:03° and 0:7 octave, respectively. Note that in (b) and (c), by 0 octave and 0° we denote that the gain pool only contained the response from excitatory frequency and excitatory orientation channel, respectively. Furthermore, note that the vertical axes for all three plots are in logarithmic scale. The description of all four sets (shown in the legend in [a]) are given in Performance of the model. To change the pooling width, we used convolution with kernels of varying lengths with equal weights in each dimension.
Figure 17
 
Effects of gain control pooling width. The RMSEs between the thresholds and model predictions were calculated by varying (a) space pooling width (in degrees of subtended angle), (b) spatial frequency pooling width (octave), and (c) orientation pooling width (in degrees). While varying only the space pooling width (for [a]), the frequency and orientation pooling widths were 0:7 octave and 60°, respectively. While varying the frequency pooling width (for [b]), the space and orientation pooling widths were approximately 0:03° and 60°, respectively. While varying orientation pooling width (for [c]), the space and frequency pooling widths were approximately 0:03° and 0:7 octave, respectively. Note that in (b) and (c), by 0 octave and 0° we denote that the gain pool only contained the response from excitatory frequency and excitatory orientation channel, respectively. Furthermore, note that the vertical axes for all three plots are in logarithmic scale. The description of all four sets (shown in the legend in [a]) are given in Performance of the model. To change the pooling width, we used convolution with kernels of varying lengths with equal weights in each dimension.
From Figure 17a observe that the model prediction improves when the spatial pooling width is smaller, which suggests very localized pooling over space. Solomon and Watson (1995), Watson and Solomon (1997), and Snowden and Hammett (1995) also reported very localized pooling over space. Although Watson and Solomon (1997) reported that their model fit was not sensitive to spatial pooling width, our results (Figure 17a) suggest that the model performance degrades with a larger pool over space. Watson and Solomon (1997) acknowledged that their model was fitted using the results of Foley and Boynton's (1994) experiments (masking of Gabor patterns with sine-wave gratings), which were not designed to explore spatial pooling. However, our masks were drawn from the local patches of the natural image set. Thus, our masks were spatially more correlated compared to the grating-type masks. Thus, it might be possible that due to the increased spatial-correlation of natural-image masks, contrast gain control models become more sensitive to the spatial pooling width. However, it would be interesting to further investigate the effects of pooling over space using natural-image masks. 
Figure 17 suggests that the radial frequency pooling width has insignificant effects on the model prediction (observe the almost flat lines in Figure 17b). On the contrary, Figure 17c suggests that the orientation pooling width has a significant effect on the model predictions. These findings match with Watson and Solomon's (1997) study, where they reported no pooling over spatial frequency, and almost 90° of pooling over orientation. Figure 17c also suggests that the model performance improves mostly because of pooling from cross-orientations rather than pooling from within-orientations, which is consistent with Meese and Holmes's (2007) study on cross-orientation suppression. However, although there have been studies on suppression using grating-type stimuli (e.g., Legge, 1984; Foley, 1994; Snowden & Hammett, 1995; Watson & Solomon, 1997; Meese & Holmes, 2007, 2010; Baker & Meese, 2012), we believe further investigation is required in order to better understand the suppression using natural-image stimuli. 
Cross-orientation facilitation
Some of the mask patches shown in Figure 6 facilitated the detection of the target; these patches consisted primarily of simple horizontal edges. The facilitation we observed here might be linked to cross-orientation facilitation (e.g., Meese & Holmes, 2007; Meese, Holmes, & Challinor, 2007; Meese, Summers et al., 2007). Several studies that have used Gabor and/or sine-wave gratings reported up to 3 dB of facilitation (using Michelson contrast) from the cross-oriented masks (Baker, Meese, & Summers, 2007; Meese, Holmes, & Challinor, 2007). In our experiment, the maximum facilitation was approximately 1.4 dB (in Michelson contrast units), which was within the range (less than 3 dB) described in the previous studies (Baker et al., 2007; Meese & Holmes, 2007; Meese, Holmes, & Challinor, 2007; Meese, Summers, et al., 2007). In a remote facilitation study, Meese et al. (2007) demonstrated that the cross-orientation facilitation started at the target frequency of around 3 c/° and was maximum at around 5.7 c/°. Our target spatial frequency (center radial frequency 3.7 c/°) was within Meese et al.'s (2007) facilitatory target frequency range. Additional future studies are needed in order to quantify the possible facilitation/masking by natural image masks with different target spatial frequencies. 
Effects of zero-frequency masking
In our experiment, very dark patches gave rise to higher thresholds (strong masking). The higher thresholds for dark regions might be attributable to luminance masking (e.g., Peli, 1990; Lubin, 1993; Watson, 1993b). It is well known that luminance masking is a spatially localized phenomenon. Rogers and Carel (1973) showed that when the local average luminance was low (below 10 cd/m2) the contrast threshold increased significantly, which matches with our observation. However, the strong masking from the dark patches might also be attributable to zero-frequency masking (Yang, Qi, & Makous, 1995). From adaptation studies, it is well known that in the spatial frequency domain the amount of the adapting light can be represented by the zero-frequency (DC) component. To investigate the similarity between masking and adaptation, Yang et al. (1995) treated light adaptation as masking by the zero-frequency component. They found that the zero-frequency component acts similarly to the other frequency components in terms of contrast sensitivity. In Yang et al.'s model, the target's threshold amplitude could be increased by an equivalent noise (NE) composed of three noises:  where N0 is the visual system's intrinsic noise or dark noise (Barlow, 1956; Makous, 1990; Pelli, 1990), βL1/2 corresponds to shot noise resulting from the random absorption of photons, β is the coefficient of luminance noise, f is the spatial frequency, σ0 is the bandwidth of the zero-frequency channel, and η0 is a parameter that scales the magnitude of masking by the zero-frequency component. At any fixed luminance the first two noises (dark noise N0 and shot noise βL1/2) would be constant. However, the third term peaks at zero spatial frequency and decays rapidly with increasing spatial frequency. Peli (1990) denoted this zero-frequency noise as neural noise. It might be possible that for the extremely dark patches, due to the reduced visual sensitivity (Frazor & Geisler, 2006; Hood & Finkelstein, 1986), the neural noise was increased, which eventually increased the equivalent noise and thus the thresholds. 
Effects of patch-categorization and entropy masking
Chandler et al. (2009) classified 14 natural image patches into three categories: textures, structures, and edges. For each class, the parameters of a gain control model were optimized. Thus, Chandler et al.'s model required precategorization of a patch into the appropriate class and required applying the gain control model with the optimized parameters for that class. 
To observe any relations between our thresholds and the patch categories, we selected 18 patches and divided them into five categories: blank (patches with no detectable edge or structure, but not totally dark), edge (patches with simple edges), structure (patches with identifiable structures), texture (patches with regular textured patterns), and dark (patches that are very dark). Figure 18 shows the thresholds versus category plot for the selected 18 patches. Also shown are error bars corresponding to the standard deviations across runs and across subjects. These trends suggest that a patch's category may indeed have an influence on detection thresholds. 
Figure 18
 
Thresholds versus patch categories (Chandler et al., 2009). We selected 18 patches of five categories: blank, edge, structure, texture, and dark. The average thresholds along with the standard deviations of the selected patches are shown. Despite some variations, there is an increasing trend of thresholds from the blank category to dark category.
Figure 18
 
Thresholds versus patch categories (Chandler et al., 2009). We selected 18 patches of five categories: blank, edge, structure, texture, and dark. The average thresholds along with the standard deviations of the selected patches are shown. Despite some variations, there is an increasing trend of thresholds from the blank category to dark category.
From Figure 18, observe that different patch categories have different visual complexities, and therefore the patch-categorization approach may just be a different way of describing “entropy masking” proposed by Watson et al. (1997). The term entropy highlighted the fact that for some masks, the threshold elevation might result from the subject's unfamiliarity with the mask; i.e., thresholds are elevated for masks that are more difficult to learn. Since, in our case, the thresholds were measured for the local regions of natural images, many of the patches likely contained such unfamiliar content. Thus, the reason for our threshold elevations could be due to entropy masking. From Figure 10 we found the correlation between the first-order mask entropy and the thresholds to be rather low. However, it should be noted that the first-order entropy is a poor measure of the true entropy and an even poorer measure of a subject's unfamiliarity with the mask. Although more sophisticated entropy estimators have been proposed (e.g., Chandler & Field, 2007; Field & Chandler, 2012), these estimates require a population of patches rather than individual patches. A fuller investigation regarding the influence of entropy masking on our results is certainly an avenue for future research. 
Effects of contrast integration and dilution masking
Contrast integration and dilution masking are two additional phenomena that could potentially be incorporated into a future masking model to better predict our thresholds. For example, our implementation of gain control model overlooked contrast integration both in the numerator and denominator of the gain control equation (see Equation 22). Researchers have shown that area summation (contrast integration) before the divisive gain control can improve the predictions of thresholds for sine-wave gratings (Meese & Summers, 2007; Baker & Meese, 2011). For targets having a similar spatial extent as the pedestal, the excitatory area summation is difficult to see empirically, because the pedestal suppresses the excitatory integration effects. However, our stimuli were generated by padding an extra 1.43° from the mask beyond the target region. Thus, some inappropriate integration might have occurred from the nonsignal regions of the stimuli and/or from the excitatory responses having mismatched phase with the target. Such inappropriate integration from the non-signal regions is called dilution masking (Meese & Summers, 2007; Meese & Baker, 2013). We believe that although contrast integration and dilution masking effects have been measured in grating studies, the ideas can be extended to our study as well. Specifically, since our mask patches were drawn from local regions of complex natural images, many of our masks were inherently difficult to learn. Thus, the uncertainty/unfamiliarity of the mask (i.e., entropy masking) could also lead to the inappropriate integration process. All of these ideas could be implemented with the gain control model to improve the model performance. We expect to return to these ideas in future models of masking. 
Limitations of our study
It is important to point out the limitations of our study. One limitation is that we tested only one target center frequency and one target orientation. Testing select target frequencies and select target orientations is common in the literature when using unnatural masks that are controlled and limited in terms of frequencies and orientations (e.g., Carter & Henning, 1971; Legge & Foley, 1980; Foley & Legge, 1981; Legge et al., 1987). However, our masks were local regions of natural images, which are generally broadband in terms of spatial frequency. Thus, although testing just one center frequency/orientation is an important first step, testing targets with other radial frequencies and orientations is an equally important next step. 
We acknowledge that recent studies have shown that the cross-orientation suppression from the gain pool is strong in the lower spatial frequency and higher temporal frequency (e.g., Meese & Holmes, 2007; Meese & Baker, 2011). Our target spatial frequency was 3.7 c/° and our target temporal frequency was zero (target intensity did not change during the 5-s stimuli viewing time). Thus, in our case the effects of cross-orientation suppression might not be very high. However, suppression from within-orientation (suppression from the mask orientations similar to the target orientation) cannot be ignored. Furthermore, Figure 17c suggests that cross-oriented channels also affect the detection mechanism through the gain pool. Extensions of our database with varying target parameters such as orientation are avenues of future research that could provide a better understanding the suppression from the cross-orientations pool. 
Another limitation of our study is that our targets and masks always subtended a window size of 1.9° (the masks subtended 4.76° with the context). Although our window size was chosen to match the central 2° of the visual field in the fovea, it would nonetheless be interesting to investigate how the thresholds change as a function of window size. 
Conclusions
In this paper, we presented the results of a psychophysical study designed to obtain local masked detection thresholds using a database of natural images as masks. Via a spatial three-alternative forced-choice experiment, thresholds were measured for detecting 3.7 c/° vertically oriented log-Gabor noise targets placed within each 85 × 85-pixel patch (1.9° patch) of 30 natural images from the CSIQ image database (Larson & Chandler, 2010). Thus, for each image, we obtained a masking map in which each entry in the map denotes the RMS contrast threshold for detecting the log-Gabor noise target at the corresponding spatial location in the image. Our results demonstrated that detection thresholds were affected by several mask properties, such as visual complexity, fineness of textures, sharpness, and overall luminance. In addition, we examined the relations between the thresholds and various low-level mask features (luminance, contrast, statistical features, edge density, entropy, features of the magnitude spectra, sharpness, energy content around the target's band, and energy content around the target's orientation). Our quantitative analyses showed that, except for sharpness (correlation coefficient of 0.7), other tested low-level mask features showed weak correlations (correlation coefficients less than or equal to 0.52) with the detection thresholds. We also evaluated Watson and Solomon's (1997) contrast gain control model that yielded an average correlation coefficient 0.79 in predicting the thresholds. It is our hope that this database will foster future studies on visual masking in natural images. 
Acknowledgments
The authors thank Nishantha Samarakoon for useful assistance with the statistical analysis. Authors thank Dana Brunson for granting permission to use the OSU supercomputer for simulation purposes. We also thank the two reviewers and the editor for their helpful comments and suggestions that have significantly improved the overall quality of this paper. This material is based upon work supported by, or in part by, the National Science Foundation, Grant Number 1054612 to Professor Damon M. Chandler, the U.S. Army Research Laboratory (USARL) and the U.S. Army Research Office (USARO) under contract/grant number W911NF-10-1-0015 to Professor Damon M. Chandler, and Google Faculty Research Award to Professor David J. Field. 
Commercial relationships: none. 
Corresponding author: Md Mushfiqul Alam. 
Email: mdma@okstate.edu. 
Address: Laboratory of Computational Perception and Image Quality, Oklahoma State University, Stillwater, OK, USA. 
References
Ahumada A. J. (1967). Detection of tones masked by noise; a comparison of human observers with digital-computer-simulated energy detectors of varying bandwidths ( Unpublished doctoral dissertation). University of California, Los Angeles.
Ashikhmin M. (2001). Synthesizing natural textures. In Proceedings of the 2001 symposium on interactive 3d graphics (pp. 217–226). New York: ACM.
Aydin T. O. Čadík M. Myszkowski K. Seidel H.-P. (2010). Video quality assessment for computer graphics applications. Acm Transactions on Graphics (Tog), 29, 161. [CrossRef]
Baker D. H. Meese T. S. (2011). Contrast integration over area is extensive: A three-stage model of spatial summation. Journal of Vision, 11 (14): 14, 1–16, http://www.journalofvision.org/content/11/14/14, doi:10.1167/11.14.14. [PubMed] [Article] [CrossRef]
Baker D. H. Meese T. S. (2012). Zero-dimensional noise: The best mask you never saw. Journal of Vision, 12 (10): 20, 1–12, http://www.journalofvision.org/content/12/10/20, doi:10.1167/12.10.20. [PubMed] [Article] [CrossRef]
Baker D. H. Meese T. S. Summers R. J. (2007). Psychophysical evidence for two routes to suppression before binocular summation of signals in human vision. Neuroscience, 146 (1), 435–448. [CrossRef] [PubMed]
Barlow H. B. (1956). Retinal noise and absolute threshold. The Journal of the Optical Society of America, 46 (8), 634–639. [CrossRef]
Bovik A. C. (2013). Automatic prediction of perceptual image and video quality. Proceedings of the IEEE, 101 (9), 2008–2024.
Brady N. Field D. J. (2000). Local contrast in natural images: Normalisation and coding efficiency. PERCEPTION-LONDON, 29 (9), 1041–1056. [CrossRef] [PubMed]
Breitmeyer B. Ogmen H. (2006). Visual masking: Time slices through conscious and unconscious vision: Time slices through conscious and unconscious vision (Vol. 41). Oxford, UK: Oxford University Press.
Burge J. Fowlkes C. C. Banks M. S. (2010). Natural-scene statistics predict how the figure–ground cue of convexity affects human depth perception. The Journal of Neuroscience, 30 (21), 7269–7280. [CrossRef] [PubMed]
Caelli T. Moraglia G. (1986). On the detection of signals embedded in natural scenes. Perception and Psychophysics, 39, 87–95. [CrossRef] [PubMed]
Campbell F. W. Kulikowski J. J. (1966). Orientational selectivity of the human visual system. The Journal of Physiology, 187 (2), 437–445. [CrossRef] [PubMed]
Canny J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 679–698. [CrossRef]
Carandini M. Heeger D. J. (1994). Summation and division by neurons in primate visual cortex. Science, 264 (5163), 1333–1336. [CrossRef] [PubMed]
Carney T. Tyler C. W. Watson A. B. Makous W. Beutter B. Chen C.-C.… Klein S. A. (2000). Modelfest: Year one results and plans for future years. Electronic Imaging, 3959, 140–151.
Carter B. E. Henning G. B. (1971). The detection of gratings in narrow-band visual noise. The Journal of Physiology, 219 (2), 355. [CrossRef] [PubMed]
Chandler D. M. Dykes N. L. Hemami S. S. (2005). Visually lossless compression of digitized radiographs based on contrast sensitivity and visual masking. In Eckstein M. Jiang Y. (Eds.), Proceedings of spie medical imaging 2005: Image perception, observer performance, and technology assessment (Vol. 5749, pp. 359–372).
Chandler D. M. Field D. J. (2007). Estimates of the information content and dimensionality of natural scenes from proximity distributions. The Journal of Optical Society of America, 24 (4), 922–941. [CrossRef]
Chandler D. M. Gaubatz M. D. Hemami S. S. (2009). A patch-based structural masking model with an application to compression. The Journal of Image and Video Processing, 2009, 1–22. [CrossRef]
Chandler D. M. Hemami S. S. (2003). Effects of natural images on the detectability of simple and compound wavelet subband quantization distortions. The Journal of Optical Society of America, 20 (7).
Chandler D. M. Hemami S. S. (2007). Vsnr: A wavelet-based visual signal-to-noise ratio for natural images. IEEE Transactions on Image Processing, 16 (9), 2284–2298. [CrossRef] [PubMed]
Cox I. J. Miller M. L. (1997). A review of watermarking and the importance of perceptual modeling. In Rogowitz B. Pappas T. (Eds.), Proc. spie human vision and electronic imaging ii (Vol. 3016, pp. 92–99). San Jose, CA.
Daly S. (1987). Subroutine for the generation of a two dimensional human visual contrast sensitivity function. Rochester, NY: Eastman Kodak.
Daly S. J. (1993). Visible differences predictor: An algorithm for the assessment of image fidelity. In Watson A. B. (Ed.), Digital images and human vision (pp. 179–206). Conference volume 1666, Human Vision, Visual Processing, and Digital Display III, February 09, 1992, San Jose, CA.
Daly S. J. Zeng W. Li J. Lei S. (2000). Visual masking in wavelet compression for jpeg-2000. Electronic Imaging, 3974, 66–80.
Delord S. (1998). Which mask is the most efficient: A pattern or a noise? It depends on the task. Visual Cognition, 5 (3), 313–338. [CrossRef]
DeValois R. L. DeValois K. K. (1990). Spatial vision. New York: Oxford University Press.
Eckert M. P. Bradley A. P. (1998). Perceptual quality metrics applied to still image compression. Signal Processing, 70 (3), 177–200. [CrossRef]
Eckstein M. P. Jr., Ahumada J. A. Watson A. B. (1997). Visual signal detection in structured backgrounds. ii. Effects of contrast gain control, contrast variations, and white noise. The Journal of Optical Society of America, 14 (9), 2406–2419. [CrossRef]
Ferwerda J. A. Shirley P. Pattanaik S. N. Greenberg D. P. (1997). A model of visual masking for computer graphics. In Proceedings of the 24th annual conference on computer graphics and interactive techniques (pp. 143–152). New York: ACM Press/Addison-Wesley Publishing Co.
Field D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. The Journal of Optical Society of America, 4 (12), 2379–2394. [CrossRef]
Field D. J. Chandler D. M. (2012). Method for estimating the relative contribution of phase and power spectra to the total information in natural-scene patches. The Journal of Optical Society of America, 29 (1), 55–67. [CrossRef]
Field D. J. Hayes A. Hess R. F. (1993). Contour integration by the human visual system: Evidence for a local association field. Vision Research, 33 (2), 173–193. [CrossRef] [PubMed]
Foley J. M. (1994). Human luminance pattern-vision mechanisms: Masking experiments require a new model. The Journal of Optical Society of America, 11 (6), 1710–1719. [CrossRef]
Foley J. M. Boynton G. M. (1994). New model of human luminance pattern vision mechanisms: Analysis of the effects of pattern orientation, spatial phase, and temporal frequency. Proceedings of Spie, 2054, 32.
Foley J. M. Legge G. E. (1981). Contrast detection and near-threshold discrimination in human vision. Vision Research, 21 (7), 1041–1053. [CrossRef] [PubMed]
Frazor R. A. Geisler W. S. (2006). Local luminance and contrast in natural images. Vision Research, 46 (10), 1585–1598. [CrossRef] [PubMed]
Geisler W. S. Albrecht D. G. (1992). Cortical neurons: Isolation of contrast gain control. Vision Research, 32 (8), 1409–1410. [CrossRef] [PubMed]
Geisler W. S. Perry J. S. Super B. J. Gallogly D. P. (2001). Edge co-occurrence in natural images predicts contour grouping performance. Vision Research, 41 (6), 711–724. [CrossRef] [PubMed]
Georgeson M. A. (1984). Eye movements, afterimages and monocular rivalry. Vision Research, 24 (10), 1311–1319. [CrossRef] [PubMed]
Georgeson M. A. (1987). Temporal properties of spatial contrast vision. Vision Research, 27 (5), 765–780. [CrossRef] [PubMed]
Georgeson M. A. Turner R. (1985). Afterimages of sinusoidal, square-wave and compound gratings. Vision Research, 25 (11), 1709–1720. [CrossRef] [PubMed]
Gilbert C. D. Sigman M. Crist R. E. (2001). The neural basis of perceptual learning. Neuron, 31 (5), 681–697. [CrossRef] [PubMed]
Gonzalez R. C. Woods R. E., & Eddins S. L. (2009). Digital image processing using matlab (Vol. 2). Knoxville, TN: Gatesmark Publishing.
Graham N. (1989). Visual pattern analyzers. New York: Oxford University Press.
Hannigan B. T. Reed A. Bradley B. (2001). Digital watermarking using improved human visual system model. Proceedings of Spie, 4314, 468.
Heeger D. J. (1992). Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9, 181–197. [CrossRef] [PubMed]
Heeger D. J. Bergen J. R. (1995). Pyramid-based texture analysis/synthesis. In Proceedings of siggraph (p. 229–238). Los Angeles, CA: Association for Computing Machinery.
Heeger D. J. Teo P. C. (1995). A model of perceptual image fidelity. Proceedings of the International Conference on Image Processing, 1995, 2, 343–345.
Henning G. B. Hertz B. G. Hinton J. (1981). Effects of different hypothetical detection mechanisms on the shape of spatial-frequency filters inferred from masking experiments: I. Noise masks. J. of Opt. Soc. Am, 71 (5), 574–581. [CrossRef]
Holmes D. J. Meese T. S. (2004). Grating and plaid masks indicate linear summation in a contrast gain pool. Journal of Vision, 4 (12): 7, 1080–1089, http://www.journalofvision.org/content/4/12/7/, doi:10.1167/4.12.7. [PubMed] [Article] [PubMed]
Hood D. C. Finkelstein M. A. (1986). Sensitivity to light. In Boff, Kaufman, & Thomas (Eds.), Handbook of Perception and Human Performance (Vol. 1, Ch. 5). New York: Wiley.
Howe C. Q. Purves D. (2002). Range image statistics can explain the anomalous perception of length. Proceedings of the National Academy of Sciences, USA, 99 (20), 13184–13188. [CrossRef]
Huang J. Shi Y. (1998). Adaptive image watermarking scheme based on visual masking. Electronics Letters, 34, 748–750. [CrossRef]
Karybali I. G. Berberidis K. (2006). Efficient spatial image watermarking via new perceptual masking and blind detection schemes. IEEE Transactions on Information Forensics and Security, 1 (2), 256–274. [CrossRef]
Knill D. C. Field D. Kersten D. (1990). Human discrimination of fractal images. The Journal of Optical Society of America, 7 (6), 1113–1123. [CrossRef]
Koz A. Alatan A. A. (2008). Oblivious spatio-temporal watermarking of digital video by exploiting the human visual system. IEEE Transactions on Circuits and Systems for Video Technology, 18 (3), 326–337. [CrossRef]
Kutter M. Winkler S. (2002). A vision-based masking model for spread-spectrum image watermarking. IEEE Transactions on Image Process, 11 (1), 16–25. [CrossRef]
Laparra V. Muñoz-Marí J. Malo J. (2010). Divisive normalization image quality metric revisited. The Journal of Optical Society of America, 27 (4), 852–864. [CrossRef]
Larson E. C. Chandler D. M. (2010). Most apparent distortion: Full-reference image quality assessment and the role of strategy. Journal of Electronic Imaging, 19 (1), 011006.
Legge G. E. (1984). Binocular contrast summation i. detection and discrimination. Vision Research, 24 (4), 373–383. [CrossRef] [PubMed]
Legge G. E. Foley J. M. (1980). Contrast masking in human vision. The Journal of Optical Society of America, 70, 1458–1470. [CrossRef]
Legge G. E. Kersten D. Burgess A. E. (1987). Contrast discrimination in noise. The Journal of Optical Society of America, 4 (2), 391–404. [CrossRef]
Littell R. C. (2006). Sas. Encyclopedia of Environments, Vol. 4, Wiley Online Library.
Lu Z.-L. Sperling G. (1996). Contrast gain control in first-and second-order motion perception. The Journal of Optical Society of America, 13 (12), 2305–2318. [CrossRef]
Lubin J. (1993). The use of psychophysical data and models in the analysis of display system performance. Digital images and human vision, (pp. 146–176). Cambridge, MA: MIT Press.
Lubin J. (1995). A visual discrimination model for imaging system design and evaluation. In Peli E. (Ed.), Vision models for target detection and recognition (pp. 245–283). New York: World Scientific.
Makous W. (1990). Absolute sensitivity. In Hess R. F. Sharpe L. T. Nordby K. (Eds.), Night vision: Basic, clinical and applied aspects (pp. 146–176 ). New York: Cambridge University Press.
Malo J. Laparra V. (2010). Psychophysically tuned divisive normalization approximately factorizes the pdf of natural images. Neural Computation, 22 (12), 3179–3206. [CrossRef] [PubMed]
Mannos J. L. Sakrison D. J. (1974). The effects of a visual fidelity criterion on the encoding of image. IEEE Transactions on Information Theory, 20, 525–535. [CrossRef]
Mante V. Frazor R. A. Bonin V. Geisler W. S. Carandini M. (2005). Independence of luminance and contrast in natural scenes and in the early visual system. Nature Neuroscience, 8 (12), 1690–1697. [CrossRef] [PubMed]
Masry M. Chandler D. M. Hemami S. S. (2003). Digital watermarking using local contrast-based texture masking. Signals, Systems and Computers, 2003. Conference Record of the Thirty-Seventh Asilomar Conference on, 2, 1590–1594.
Masry M. A. Hemami S. S. (2004). A metric for continuous quality evaluation of compressed video with severe distortions. Signal Processing: Image Communication, 19, 133–146. [CrossRef]
Meese T. S. Baker D. H. (2011). A reevaluation of achromatic spatio-temporal vision: Nonoriented filters are monocular, they adapt, and can be used for decision making at high fiicker speeds. i-Perception, 2 (2), 159–182. [CrossRef] [PubMed]
Meese T. S. Baker D. H. (2013). A common rule for integration and suppression of luminance contrast across eyes, space, time, and pattern. i-Perception, 4 (1), 1. [CrossRef] [PubMed]
Meese T. S. Challinor K. L. Summers R. J. (2008). A common contrast pooling rule for suppression within and between the eyes. Visual Neuroscience, 25 (04), 585–601. [PubMed]
Meese T. S. Holmes D. J. (2002). Adaptation and gain pool summation: Alternative models and masking data. Vision Research, 42 (9), 1113–1125. [CrossRef] [PubMed]
Meese T. S. Holmes D. J. (2007). Spatial and temporal dependencies of cross-orientation suppression in human vision. Proceedings of the Royal Society B: Biological Sciences, 274 (1606), 127–136. [CrossRef]
Meese T. S. Holmes D. J. (2010). Orientation masking and cross-orientation suppression (xos): Implications for estimates of filter bandwidth. Journal of Vision, 10 (12): 9, 1–20, http://www.journalofvision.org/content/10/12/9, doi:10.1167/10.12.9. [PubMed] [Article]
Meese T. S. Holmes D. J. Challinor K. L. (2007). Remote facilitation in the fourier domain. Vision Research, 47 (8), 1112–1119. [CrossRef] [PubMed]
Meese T. S. Summers R. J. (2007). Area summation in human vision at and above detection threshold. Proceedings of the Royal Society B: Biological Sciences, 274 (1627), 2891–2900. [CrossRef]
Meese T. S. Summers R. J. Holmes D. J. Wallis S. A. (2007). Contextual modulation involves suppression and facilitation from the center and the surround. Journal of Vision, 7 (4): 7, 1–21, http://www.journalofvision.org/content/7/4/7/, doi:10.1167/7.4.7. [PubMed] [Article]
Moulden B. Kingdom F. A. A. Gatley L. F. (1990). The standard deviation of luminance as a metric for contrast in random-dot images. Perception, 19, 79–101. [CrossRef] [PubMed]
Nachmias J. Sansbury R. V. (1974). Grating contrast: Discrimination may be better than detection. Vision Research, 14 (10), 1039–1042. [CrossRef] [PubMed]
Nadenau M. J. Reichel J. (2000). Image-compression-related contrast-masking measurements. In Rogowitz B. E. Pappas T. N. (Eds.), Human vision and electronic imaging v (Vol. 3959, p. 188–199).
Nadenau M. J. Reichel J. Kunt M. (2002). Performance comparison of masking models based on a new psychovisual test method with natural scenery stimuli. Signal Processing: Image Communication, 17 (10), 807–823. [CrossRef]
Ninassi A. Le Meur O. Le Callet P., & Barba D. (2008). On the performance of human visual system based image quality assessment metric using wavelet domain. Proceedings of the Spie Conference Human Vision and Electronic Imaging xiii, 6806.
Pantle A. (1974). Visual information processing of complex imagery (Tech. Rep.). DTIC Document.
Párraga C. A. Troscianko T. Tolhurst D. J. (2000). The human visual system is optimised for processing the spatial information in natural visual images. Current Biology, 10 (1), 35–38. [CrossRef] [PubMed]
Pashler H. (1988). Familiarity and visual change detection. Perception & Psychophysics, 44 (4), 369–378. [CrossRef] [PubMed]
Peli E. (1990). Contrast in complex images. The Journal of Optical Society of America, 7, 2032–2040. [CrossRef]
Pelli D. G. (1987). The ideal psychometric procedure. Investigative Ophthalmology and Visual Science, 26, 366.
Pelli D. G. (1990). The quantum efficiency of vision. In Blakemore C. (Ed.), Vision: Coding and efficiency (pp. 3–24 ). Cambridge, UK: Cambridge University Press.
Phillips G. C. Wilson H. R. (1984). Orientation bandwidths of spatial mechanisms measured by masking. The Journal of Optical Society of America, 1 (2), 226–232. [CrossRef]
Rogers J. Carel W. (1973). Development of design criteria for sensor displays devices report hac ref. no. c6619 (Tech. Rep. Office of Naval Research Contract Number: N00014-72-C-0451, NR213-107). Culver City, CA: Hughes Aircraft Company.
Sachs M. B. Nachmias J. Robson J. G. (1971). Spatial-frequency channels in human vision. The Journal of Optical Society of America, 61, 1176–1186. [CrossRef]
Schwartz O. Simoncelli E. P. (2001). Natural signal statistics and sensory gain control. Nature Neuroscience, 4 (8), 819–825. [CrossRef] [PubMed]
Snowden R. J. Hammett S. T. (1995). The effect of contrast surrounds on contrast centers-merely normal masking. Investigative Ophthalmology & Visual Science, 36, S438.
Solomon J. A. Watson A. B. (1995). Spatial and spatial frequency spreads of masking: Measurements and a contrast-gain-control model. Presented at European Conference on Visual Perception, August 21–25, 1995, Tubingen, Germany.
Stromeyer C. F. III, Julesz B. (1972). Spatial-frequency masking in vision: Critical bands and spread of masking. The Journal of Optical Society of America, 62 (10), 1221–1232. [CrossRef]
Teo P. C. Heeger D. J. (1994). Perceptual image distortion. Proceedings of SPIE, 2179, 127–141.
Tolhurst D. J. Tadmor Y. (1997). Band-limited contrast in natural images explains the detectability of changes in the amplitude spectra. Vision Research, 37 (23), 3203–3215. [CrossRef] [PubMed]
Tolhurst D. J. Tadmor Y. (2000). Discrimination of spectrally blended natural images: Optimisation of the human visual system for encoding natural images. Perception-London, 29 (9), 1087–1100. [CrossRef] [PubMed]
Truchard A. M. Ohzawa I. Freeman R. D. (2000). Contrast gain control in the visual cortex: Monocular versus binocular mechanisms. The Journal of Neuroscience, 20 (8), 3017–3032. [PubMed]
Tukey J. W. (1977). Exploratory data analysis. Mass. Reading, 231.
VQEG. (2003, August). Final report from the video quality experts group on the validation of objective models of video quality assessment, phase ii. Available at http://www.vqeg.org
Vu C. T. Phan T. D. Chandler D. M. (2012). S3: A spectral and spatial measure of local perceived sharpness in natural images. IEEE Transactions on Image Processing, 21 (3), 934–945. [CrossRef] [PubMed]
Walter B. Pattanaik S. N. Greenberg D. P. (2002). Using perceptual texture masking for efficient image synthesis. Computer Graphics Forum, 21, 393–399. [CrossRef]
Watson A. B. (1982). Summation of grating patches indicates many types of detector at one retinal location. Vision Research, 22 (1), 17–25. [CrossRef] [PubMed]
Watson A. B. (1993a). Dct quantization matrices visually optimized for individual images. Proceedings of Spie, 1913, 202–216.
Watson A. B. (1993b). Dctune: A technique for visual optimization of dct quantization matrices for individual images. Sid International Symposium Digest of Technical Papers, 24, 946–946.
Watson A. B. Borthwick R. Taylor M. (1997). Image quality and entropy masking. Proceedings of SPIE, 3016.
Watson A. B. Solomon J. A. (1997). A model of visual contrast gain control and pattern masking. The Journal of Optical Society of America, 14 (9), 2379–2391. [CrossRef]
Webster M. A. Miyahara E. (1997). Contrast adaptation and the spatial structure of natural images. The Journal of Optical Society of America, 14 (9), 2355–2366. [CrossRef]
Wilson H. R. Humanski R. (1993). Spatial frequency adaptation and contrast gain control. Vision Research, 33 (8), 1133–1149. [CrossRef] [PubMed]
Wilson H. R. McFarlane D. K. Phillips G. C. (1983). Spatial frequency tuning of orientation selective units estimated by oblique masking. Vision Research, 23 (9), 873–882. [CrossRef] [PubMed]
Winkler S. (2012). Analysis of public image and video databases for quality assessment. IEEE Journal of Selected Topics in Signal Processing, 6 (6), 616–625. [CrossRef]
Winkler S. Susstrünk S. (2004). Visibility of noise in natural images. Proceedings of Spie, 5292, 121–129.
Yang J. Qi X. Makous W. (1995). Zero frequency masking and a model of contrast sensitivity. Vision Research, 35 (14), 1965–1978. [CrossRef] [PubMed]
Zheng W. Daly S. Lei S. (2000 ). Point-wise extended visual masking for jpeg-2000 image compression. Proceedings on the International Conference on Image Processing, 1, 657–660.
Appendix A: Correlation coefficients for the subject consistency
Table 3 shows the intrasubject and intersubject consistencies in terms of three parameters: CC, SROCC, and RMSE. The consistency values were calculated individually for 30 mask images. In Table 3 along with the image-by-image consistency values, average and overall consistency values are also shown. The average intra- and intersubject consistency values of Figure 7 were calculated from the values given in Table 3
Table 3
 
Intrasubject and intersubject consistencies for all 30 mask images in terms of CC, SROCC, and RMSE. “Sub1,” “Sub2,” and “Sub3” denote the intrasubject consistencies for Subject1, Subject2, and Subject3. “S1&2,” “S2&3,” and “S3&1” denote the intersubject consistency values. The average and overall consistency values are shown at the last two columns of the bottom-right corner of the table.
Table 3
 
Intrasubject and intersubject consistencies for all 30 mask images in terms of CC, SROCC, and RMSE. “Sub1,” “Sub2,” and “Sub3” denote the intrasubject consistencies for Subject1, Subject2, and Subject3. “S1&2,” “S2&3,” and “S3&1” denote the intersubject consistency values. The average and overall consistency values are shown at the last two columns of the bottom-right corner of the table.
log_ seaside sunsetcolor lake redwood snow_ leaves swarm elk foxy child_ swimming native_ american couple roping woman
Intrasubject consistency
 CC Sub1 0.95 0.98 0.95 0.97 0.97 0.97 0.9 0.93 0.97 0.86 0.86 0.95 0.98
Sub2 0.88 0.97 0.88 0.95 0.95 0.98 0.87 0.82 0.99 0.9 0.94 0.92 0.96
Sub3 0.97 0.99 0.92 0.93 0.93 0.97 0.93 0.89 0.91 0.96 0.87 0.97 0.92
 SROCC Sub1 0.95 0.88 0.94 0.96 0.95 0.91 0.89 0.9 0.91 0.84 0.83 0.92 0.95
Sub2 0.89 0.81 0.82 0.91 0.86 0.92 0.78 0.82 0.97 0.88 0.93 0.85 0.94
Sub3 0.97 0.95 0.85 0.88 0.81 0.81 0.9 0.83 0.89 0.94 0.86 0.94 0.9
 RMSE Sub1 2.5 2.37 2.67 2.04 2.68 1.86 3.62 2.66 3.1 2.8 4.14 2.48 1.72
Sub2 4.99 2.94 3.88 2.77 3.26 1.78 4.87 3 1.68 3.03 2.52 2.79 2.3
Sub3 2.54 1.5 3.74 3.86 2.8 2.49 3.7 2.89 3.82 2.08 3.92 2.61 3.69
Intersubject consistency
 CC S1&2 0.87 0.98 0.88 0.94 0.84 0.97 0.85 0.85 0.95 0.81 0.88 0.93 0.94
S2&3 0.92 0.97 0.86 0.93 0.92 0.98 0.94 0.81 0.95 0.89 0.87 0.9 0.96
S3&1 0.9 0.98 0.93 0.95 0.94 0.97 0.85 0.83 0.92 0.85 0.87 0.92 0.94
 SROCC S1&2 0.86 0.91 0.82 0.89 0.61 0.83 0.81 0.8 0.89 0.77 0.87 0.89 0.89
S2&3 0.92 0.82 0.77 0.89 0.82 0.85 0.87 0.77 0.93 0.83 0.84 0.84 0.93
S3&1 0.88 0.87 0.88 0.92 0.77 0.78 0.79 0.76 0.9 0.86 0.86 0.88 0.89
 RMSE S1&2 5.3 1.98 4.46 4.92 4.81 2.83 4.5 4.92 3.91 3.86 5.95 4.37 3.44
S2&3 4.02 2.51 3.76 3.26 3.21 1.87 3.14 3.79 2.85 2.95 4.15 4.75 3.33
S3&1 6.58 2.32 3.43 3.91 4.92 3.94 4.2 3.94 4.71 3.42 4.3 3.06 5.21
Table 3
 
Extended.
Table 3
 
Extended.
fisher butter_ flower shroom cactus sunset_ sparrow family veggies turtle geckos bridge i1600 rushmore lady_ liberty monu- ment boston aerial_ city trolley Average Overall
Intrasubject consistency
0.89 0.98 0.93 0.86 0.97 0.98 0.97 0.96 0.89 0.92 0.96 0.97 0.88 0.87 0.92 0.94 0.92 0.93 0.95
0.94 0.99 0.94 0.96 0.96 0.97 0.95 0.92 0.92 0.97 0.99 0.98 0.94 0.92 0.95 0.94 0.91 0.94 0.95
0.91 0.96 0.93 0.94 0.96 0.97 0.98 0.94 0.9 0.95 0.97 0.9 0.87 0.77 0.89 0.98 0.95 0.93 0.95
0.89 0.96 0.9 0.87 0.95 0.96 0.87 0.93 0.85 0.85 0.91 0.96 0.89 0.88 0.88 0.95 0.91 0.91 0.95
0.95 0.97 0.94 0.94 0.95 0.96 0.9 0.83 0.81 0.95 0.97 0.98 0.91 0.88 0.95 0.93 0.91 0.9 0.95
0.89 0.94 0.91 0.94 0.94 0.96 0.92 0.91 0.77 0.94 0.89 0.87 0.85 0.7 0.89 0.97 0.96 0.89 0.94
2.51 2.34 2.31 4.37 1.9 1.91 2.65 2.64 3.63 3.45 2.55 2.05 2.38 2.78 3.04 2.7 2.79 2.69 2.76
3.18 1.19 2.09 3.19 2.08 2.34 2.83 3.63 2.81 2.23 1.65 2.38 2.52 2.62 2.27 2.13 2.94 2.73 2.85
3.34 2.94 2.81 2.83 2.02 2.12 1.95 3.19 4.37 2.81 2.95 4.36 3.06 4.23 3.34 1.51 2.83 3.01 3.11
Intersubject consistency
0.93 0.95 0.91 0.95 0.91 0.97 0.91 0.95 0.85 0.92 0.96 0.94 0.92 0.75 0.95 0.9 0.94 0.91 0.93
0.92 0.96 0.86 0.95 0.96 0.96 0.88 0.93 0.9 0.89 0.97 0.97 0.92 0.86 0.96 0.86 0.96 0.92 0.92
0.92 0.97 0.87 0.95 0.91 0.98 0.97 0.94 0.92 0.96 0.96 0.92 0.86 0.72 0.89 0.93 0.92 0.91 0.92
0.94 0.95 0.9 0.93 0.92 0.97 0.8 0.89 0.8 0.89 0.91 0.93 0.92 0.7 0.9 0.88 0.93 0.87 0.92
0.91 0.95 0.81 0.91 0.95 0.96 0.7 0.88 0.81 0.84 0.93 0.94 0.91 0.8 0.95 0.85 0.96 0.87 0.91
0.92 0.96 0.85 0.96 0.91 0.98 0.87 0.9 0.86 0.93 0.91 0.92 0.81 0.69 0.84 0.92 0.92 0.87 0.92
4.49 3.37 4.4 5.01 3.7 5.27 4.75 3.81 5.93 4.81 4.63 6.61 4.47 6.93 4.41 3.62 2.56 4.47 4.59
3.18 3.68 3.75 4.02 3.19 4.96 4.61 4.14 2.75 4.66 3.47 3.1 2.48 3.21 1.93 4.07 3.83 3.49 3.57
6.34 4.8 3.63 3.25 5.79 2.14 5.01 3.3 5.3 2.61 3.07 5.57 5.3 6.13 4.07 2.76 5.16 4.27 4.44
Appendix B: Statistical inference tests for the subject consistency
To test the statistical consistency across subjects and across runs we performed image-by-image ANOVA (see Table 4) by using the generalized linear model (GLM) procedure in SAS9.3 software package with 95% confidence level (Littell, 2006). In GLM procedure, threshold was defined as the dependent variable. We defined three independent variables: patch, subject, and run. Furthermore, effects of subject-run interaction over the thresholds were tested. We presented the summary of tests in the Statistical inference tests section. 
Table 4
 
Results of the statistical ANOVA tests performed using the generalized linear model procedure in SAS9.3 software package with 95% confidence level. In the GLM procedure, the dependent variable was threshold and the independent variables were patch, subject, and run. Furthermore, the subject-run interactions over the thresholds were measured. In the first and thirteenth row of the table the mask image names are shown. The p values close to or greater than 0.05 are shown in bold fonts. The p values greater than 0.0001 and quite less than 0.05 are shown in italic letters.
Table 4
 
Results of the statistical ANOVA tests performed using the generalized linear model procedure in SAS9.3 software package with 95% confidence level. In the GLM procedure, the dependent variable was threshold and the independent variables were patch, subject, and run. Furthermore, the subject-run interactions over the thresholds were measured. In the first and thirteenth row of the table the mask image names are shown. The p values close to or greater than 0.05 are shown in bold fonts. The p values greater than 0.0001 and quite less than 0.05 are shown in italic letters.
log_ seaside sunsetcolor lake redwood snow_ leaves swarm elk foxy child_ swimming native_ american couple roping woman fisher
Model
 F 12.21 24.26 3.59 7.51 2.5 4.84 9.09 2.71 23.18 5.67 3.17 3.01 5.3 8.24
p value < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001
R Square 0.74 0.85 0.45 0.63 0.36 0.53 0.68 0.38 0.84 0.56 0.42 0.41 0.55 0.65
Patch
F 10.59 26.64 3.04 5.08 1.33 1.59 8.46 2.41 25.13 3.84 2.4 1.2 1.95 6.82
p value < 0.0001 < 0.0001 < 0.0001 < 0.0001 0.12 0.03 < 0.0001 < 0.0001 < 0.0001 < 0.0001 1.00E-04 0.22 0.003 < 0.0001
Subject
F 48.03 15.34 2.73 39.42 9.82 51.24 4.19 4.74 11.96 13.69 11.05 23.51 32.67 19.43
p value < 0.0001 < 0.0001 0.07 < 0.0001 < 0.0001 < 0.0001 0.02 0.01 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001
Run
F 9.24 1.9 28.07 2.12 5.44 16.98 2.45 9.34 22.45 15.14 11.14 4.2 32.28 10.4
p value 0.003 0.17 < 0.0001 0.15 0.02 < 0.0001 0.12 0.003 < 0.0001 1.00E-04 0.001 0.04 <0.0001 0.002
Subject-Run
F 6.24 2.82 1.82 20.83 14.18 9.39 28.26 2.58 0.66 24.94 4.88 13.57 22.97 20.92
p value 0.002 0.06 0.16 < 0.0001 < 0.0001 1.00E-04 < 0.0001 0.08 0.52 < 0.0001 0.009 < 0.0001 < 0.0001 < 0.0001
Table 4
 
Extended.
Table 4
 
Extended.
butter_ flower shroom cactus sunset_ sparrow family veggies turtle geckos bridge 1600 rushmore lady_ liberty monument boston aerial_ city trolley
8.35 4.58 14.35 5.83 10.42 3.66 4.03 1.78 11.1 13.99 16.06 4.92 2.64 6.63 6.18 3.03
< 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 0.006 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001
0.66 0.51 0.77 0.57 0.7 0.46 0.48 0.29 0.72 0.76 0.79 0.53 0.38 0.6 0.59 0.41
4.88 4.76 10.69 5.27 2.66 2.41 3.22 1.82 3.28 15.82 17.55 2.32 2.36 5.19 6.53 3.25
< 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 0.007 < 0.0001 < 0.0001 < 0.0001 2.00E-04 1.00E-04 < 0.0001 < 0.0001 < 0.0001
49.95 2.8 90.79 0.57 159.18 21.9 19.14 1.08 93.08 0.51 6.44 30.74 10.15 17.99 7.14 1.94
< 0.0001 0.06 < 0.0001 0.57 < 0.0001 < 0.0001 < 0.0001 0.34 < 0.0001 0.6 0.002 < 0.0001 < 0.0001 < 0.0001 0.001 0.15
4.35 3.7 1.54 8.39 2.56 0.71 4.62 3.88 17.9 3.1 1.27 0.83 0.97 0.22 1.96 0.01
0.04 0.06 0.22 0.004 0.11 0.4 0.03 0.05 < 0.0001 0.08 0.26 0.36 0.33 0.64 0.16 0.94
32.42 3.67 8.34 19.63 1.29 8.8 2.84 0.82 62.59 0.89 6.87 26.67 0.88 23.6 1.19 1.82
< 0.0001 0.03 3.00E-04 < 0.0001 0.28 2.00E-04 0.06 0.44 < 0.0001 0.41 0.001 < 0.0001 0.42 < 0.0001 0.31 0.16
In Table 4 the F statistics, p values, and the R-square statistics of the GLM regression model are shown under the tag “Model.” The F statistics and the p values for the independent variables are shown under the tags “Patch,” “Subject,” and “Run.” The F statistics and the p values for the subject-run interaction are shown under the tag “Subject-Run.” In the statistical tests, the degrees of freedom for model, patch, subject, and run were 40, 35, 2, and 1, respectively. 
Appendix C: Performance of the model with fixed parameters
We predicted our thresholds using Watson and Solomon's (1997) model with some modifications. The quantitative similarity between the model predictions and the thresholds was measured using CC, SROCC, and RMSE. For the performance evaluation we selected four sets of parameters (Refer to the Performance of the model section for the descriptions of the four sets). Table 5 shows the CCs, SROCCs, and RMSEs between the model predictions and the thresholds for all 30 mask images. Note that before calculating the CCs the model predictions were transformed through a logistic transform (VQEG, 2003; Larson & Chandler, 2010) to remove any nonlinearity between the ground-truth thresholds and model predictions. 
Table 5
 
CC, SROCC, and RMSE between thresholds and model predictions for individual mask images. Set1: Watson and Solomon's (1997) observer KMF (Foley, 1994) parameters: p = 2.3, q = 2.0, b = 0.02. Set2: Watson and Solomon's (1997) observer JYS (Foley, 1994) parameters: p = 2.3, q = 2.0, b = 0.08. Set3: Teo and Heeger's (1994) model parameters: p = 2.0, q = 2.0, b = 0.05. Set4: Chandler et al.'s (2009) texture-class parameters: p = 2.4, q = 2.32, b = 0.035. The best CC, SROCC, and the RMSE values of each of the mask images are shown in bold font.
Table 5
 
CC, SROCC, and RMSE between thresholds and model predictions for individual mask images. Set1: Watson and Solomon's (1997) observer KMF (Foley, 1994) parameters: p = 2.3, q = 2.0, b = 0.02. Set2: Watson and Solomon's (1997) observer JYS (Foley, 1994) parameters: p = 2.3, q = 2.0, b = 0.08. Set3: Teo and Heeger's (1994) model parameters: p = 2.0, q = 2.0, b = 0.05. Set4: Chandler et al.'s (2009) texture-class parameters: p = 2.4, q = 2.32, b = 0.035. The best CC, SROCC, and the RMSE values of each of the mask images are shown in bold font.
log_ seaside sunsetcolor lake redwood snow_ leaves swarm elk foxy child_ swimming native_ american couple roping woman fisher
Set1
 CC 0.86 0.95 0.6 0.82 0.81 0.89 0.72 0.53 0.84 0.66 0.49 0.67 0.86 0.8
 SROCC 0.78 0.72 0.55 0.84 0.81 0.71 0.68 0.43 0.75 0.65 0.49 0.58 0.77 0.73
 RMSE 4.27 8.86 5.3 4.33 3.48 5.41 5.33 4.28 5.52 4.74 5.75 8.37 5 3.74
Set2
 CC 0.86 0.93 0.54 0.8 0.75 0.86 0.71 0.5 0.84 0.68 0.41 0.76 0.87 0.79
 SROCC 0.77 0.61 0.5 0.83 0.75 0.65 0.67 0.36 0.75 0.64 0.4 0.71 0.81 0.72
 RMSE 4.21 12.78 5.55 5.14 4.47 6.82 5.43 4.61 5.56 5 5.85 10.54 5.55 3.61
Set3
 CC 0.87 0.95 0.6 0.81 0.77 0.94 0.71 0.76 0.89 0.62 0.66 0.65 0.84 0.77
 SROCC 0.8 0.72 0.62 0.84 0.76 0.76 0.68 0.58 0.81 0.62 0.64 0.54 0.74 0.73
 RMSE 5.35 8.88 6.63 5.76 6.92 3.84 6.04 3.78 5.29 5.26 4.95 9.2 4.4 6.88
Set4
 CC 0.86 0.96 0.59 0.81 0.78 0.92 0.7 0.75 0.86 0.61 0.63 0.65 0.84 0.77
 SROCC 0.79 0.72 0.61 0.85 0.76 0.72 0.68 0.69 0.78 0.61 0.61 0.54 0.75 0.71
 RMSE 5.37 8.86 6.67 5.62 6.5 3.73 6.08 3.81 5.67 5.29 5.21 9.15 4.33 6.49
Table 5
 
Extended.
Table 5
 
Extended.
butter_ flower shroom cactus sunset_ sparrow family veggies turtle geckos bridge i1600 rushmore lady_ liberty monument boston aerial_ city trolley
0.78 0.75 0.95 0.75 0.85 0.66 0.85 0.69 0.91 0.94 0.66 0.92 0.71 0.76 0.58 0.75
0.7 0.73 0.94 0.72 0.85 0.66 0.77 0.49 0.83 0.75 0.6 0.92 0.58 0.72 0.55 0.72
9.36 6.16 3.24 4.44 7.63 5 4.74 5.41 5.14 4.32 7.25 2.21 4.28 5.13 7.33 5.15
0.76 0.75 0.95 0.8 0.87 0.74 0.86 0.68 0.9 0.94 0.6 0.9 0.73 0.76 0.56 0.73
0.71 0.69 0.95 0.72 0.86 0.66 0.72 0.49 0.8 0.78 0.4 0.9 0.57 0.71 0.32 0.71
11.4 7.31 4.17 5.37 10.12 5.23 5.49 5.55 5.98 3.97 9.11 2.25 5.03 5.22 8.51 5.34
0.76 0.79 0.94 0.81 0.87 0.65 0.87 0.78 0.9 0.94 0.82 0.93 0.69 0.81 0.64 0.79
0.7 0.78 0.92 0.78 0.85 0.62 0.77 0.69 0.8 0.73 0.85 0.91 0.56 0.77 0.59 0.76
10.51 4.1 4.08 4.74 6.8 6.97 5.41 4.33 5.12 6.33 5.17 5.39 6.2 4.24 4.97 4.03
0.76 0.78 0.94 0.81 0.86 0.63 0.86 0.75 0.9 0.94 0.8 0.93 0.69 0.81 0.62 0.79
0.74 0.77 0.92 0.78 0.85 0.65 0.78 0.61 0.8 0.73 0.79 0.91 0.55 0.77 0.58 0.75
10.42 4.16 3.74 4.56 6.61 6.75 5.2 4.84 4.94 6.05 5.54 5.07 6.06 4.17 5.13 4.13
Figure 1
 
Illustration of the procedure used to generate the stimuli and the placement of the stimuli within our spatial three-alternative forced-choice setup: (a) original log_seaside image (mask), (b) vertically oriented log-Gabor noise target having a center radial frequency of 3.7 c/° with 1-octave bandwidth, (c) stimuli shown in the experiment. The left two stimuli in (c) show the mask alone, and the right stimulus shows the mask plus target for the patches shown in (a) and (b) as indicated by the boxes. The shown stimuli are with the context of width 1.43°. The total angle subtended by the stimuli was 4.76° with context. Note that the target patch shown within the red box in (b) is one of the 36 target patches. For a specific spatial location, the target patch is the same for all of the 30 mask images. However, the target patch is different in different spatial locations. The log-Gabor noise target along with the mask images and detection thresholds of the experiment are available in our http://vision.okstate.edu/masking/ online database.
Figure 1
 
Illustration of the procedure used to generate the stimuli and the placement of the stimuli within our spatial three-alternative forced-choice setup: (a) original log_seaside image (mask), (b) vertically oriented log-Gabor noise target having a center radial frequency of 3.7 c/° with 1-octave bandwidth, (c) stimuli shown in the experiment. The left two stimuli in (c) show the mask alone, and the right stimulus shows the mask plus target for the patches shown in (a) and (b) as indicated by the boxes. The shown stimuli are with the context of width 1.43°. The total angle subtended by the stimuli was 4.76° with context. Note that the target patch shown within the red box in (b) is one of the 36 target patches. For a specific spatial location, the target patch is the same for all of the 30 mask images. However, the target patch is different in different spatial locations. The log-Gabor noise target along with the mask images and detection thresholds of the experiment are available in our http://vision.okstate.edu/masking/ online database.
Figure 2
 
Temporal parameters of the experiment. Response time was limited to within 5 s of stimulus onset, during which all three choices remained visible. The time duration between stimuli disappearance and stimuli appearance was one second when subjects viewed only the 14 cd/m2 gray background.
Figure 2
 
Temporal parameters of the experiment. Response time was limited to within 5 s of stimulus onset, during which all three choices remained visible. The time duration between stimuli disappearance and stimuli appearance was one second when subjects viewed only the 14 cd/m2 gray background.
Figure 3
 
Masking maps and the standard deviation maps. The first, seventh, 13th, and 19th rows show the mask images. The three rows below the mask images show the average maps of the two runs of Subject1, Subject2, and Subject3, respectively; the fourth rows show the average masking maps of six runs; the fifth rows show the standard deviation maps across the six estimates. The thresholds corresponding to the gray-scale values are indicated by the “Threshold Colorbar,” and the standard deviations corresponding to the gray-scale values are indicated by the “Standard-Deviation Colorbar.” The total angle subtended by each of the mask images was 11.7°.
Figure 3
 
Masking maps and the standard deviation maps. The first, seventh, 13th, and 19th rows show the mask images. The three rows below the mask images show the average maps of the two runs of Subject1, Subject2, and Subject3, respectively; the fourth rows show the average masking maps of six runs; the fifth rows show the standard deviation maps across the six estimates. The thresholds corresponding to the gray-scale values are indicated by the “Threshold Colorbar,” and the standard deviations corresponding to the gray-scale values are indicated by the “Standard-Deviation Colorbar.” The total angle subtended by each of the mask images was 11.7°.
Figure 3
 
(Figure continued from the previous page.)
Figure 3
 
(Figure continued from the previous page.)
Figure 4
 
Examples of natural image patches having the (a) lowest, (b) medium, and (c) highest detection thresholds. The ranges of the average detection thresholds shown in this Figure are (a) −59.65 dB to −46.48 dB for the lowest threshold patches, (b) −35.80 dB to −33.87 dB for medium threshold patches, and (c) −24.88 dB to −9.41 dB for the highest threshold patches.
Figure 4
 
Examples of natural image patches having the (a) lowest, (b) medium, and (c) highest detection thresholds. The ranges of the average detection thresholds shown in this Figure are (a) −59.65 dB to −46.48 dB for the lowest threshold patches, (b) −35.80 dB to −33.87 dB for medium threshold patches, and (c) −24.88 dB to −9.41 dB for the highest threshold patches.
Figure 5
 
Examples of natural image patches having the (a) lowest, (b) medium, and (c) highest standard deviations of the detection thresholds across subjects and runs. The ranges of the standard deviations are (a) 0.47 dB to 1.19 dB for the lowest standard deviation, (b) 2.31 dB to 2.65 dB for medium standard deviation, and (c) 4.68 dB to 9.27 dB for the highest standard deviation.
Figure 5
 
Examples of natural image patches having the (a) lowest, (b) medium, and (c) highest standard deviations of the detection thresholds across subjects and runs. The ranges of the standard deviations are (a) 0.47 dB to 1.19 dB for the lowest standard deviation, (b) 2.31 dB to 2.65 dB for medium standard deviation, and (c) 4.68 dB to 9.27 dB for the highest standard deviation.
Figure 6
 
Example mask patches in different threshold ranges. The range of thresholds for a row of patches are shown at the left by the adjacent two numbers. For instance, the threshold range for topmost row of patches is 0 dB to −15 dB. Furthermore, for each row the threshold decreases from left to right. Note that all the ranges do not contain same number of patches. For example, in the range 0 dB to −15 dB there are only seven patches. On the other hand, in the range −25 dB to −30 dB there are total 43 patches, of which 15 example patches are shown here. The unmasked detection threshold (∼ −52 dB) is shown using the red arrow sign and the dotted line. All of the thresholds in this figure are from the two runs of Subject1.
Figure 6
 
Example mask patches in different threshold ranges. The range of thresholds for a row of patches are shown at the left by the adjacent two numbers. For instance, the threshold range for topmost row of patches is 0 dB to −15 dB. Furthermore, for each row the threshold decreases from left to right. Note that all the ranges do not contain same number of patches. For example, in the range 0 dB to −15 dB there are only seven patches. On the other hand, in the range −25 dB to −30 dB there are total 43 patches, of which 15 example patches are shown here. The unmasked detection threshold (∼ −52 dB) is shown using the red arrow sign and the dotted line. All of the thresholds in this figure are from the two runs of Subject1.
Figure 7
 
Intrasubject and intersubject consistency in terms of (a) CC and SROCC and (b) RMSE. The average consistency values of all 30 mask images are shown. The error bars indicate the standard deviations of the CC, SROCC, and RMSE across different mask images. The intra- and intersubject consistency values for individual mask images are shown in Table 3 of Appendix A.
Figure 7
 
Intrasubject and intersubject consistency in terms of (a) CC and SROCC and (b) RMSE. The average consistency values of all 30 mask images are shown. The error bars indicate the standard deviations of the CC, SROCC, and RMSE across different mask images. The intra- and intersubject consistency values for individual mask images are shown in Table 3 of Appendix A.
Figure 8
 
Histograms of (a) detection thresholds and (b) standard deviations. In the legend of (a), μCT denotes the average of all thresholds, and μCT ± CT denotes the standard deviations of the thresholds (σCT) added with or subtracted from μCT; n = −3, −2, −1, 1, 2, 3. In the legend of (b), μCTσ denotes the average of all standard deviations, and μCTσ ± nσCTσ denotes the standard deviations of the standard deviations of thresholds (σCTσ) added with or subtracted from μCTσ; n = −1, 1, 2, 3.
Figure 8
 
Histograms of (a) detection thresholds and (b) standard deviations. In the legend of (a), μCT denotes the average of all thresholds, and μCT ± CT denotes the standard deviations of the thresholds (σCT) added with or subtracted from μCT; n = −3, −2, −1, 1, 2, 3. In the legend of (b), μCTσ denotes the average of all standard deviations, and μCTσ ± nσCTσ denotes the standard deviations of the standard deviations of thresholds (σCTσ) added with or subtracted from μCTσ; n = −1, 1, 2, 3.
Figure 9
 
Box and whisker plots of the detection thresholds. The horizontal axis shows the range of thresholds. The vertical axis describes different subjects and runs. The circles denote the outliers.
Figure 9
 
Box and whisker plots of the detection thresholds. The horizontal axis shows the range of thresholds. The vertical axis describes different subjects and runs. The circles denote the outliers.
Figure 10
 
CC (correlation coefficient) and SROCC (Spearman rank-order correlation coefficient) between patch features and thresholds. CC measures how well the patch features correlate with the thresholds, and SROCC measures the relative monotonicity between the patch features and the thresholds. (a) CC and SROCC of patch features without dividing the patches into groups. The features were calculated for the patches with average luminances greater than 3 cd/m2. (b) SROCC of the features for low-, medium-, and high-threshold patches. For each group 200 patches were selected. The threshold ranges were: low threshold: −59.87 to −43.91 dB; medium threshold: −40.84 to −30.78 dB; and high threshold: −28.38 dB to −13.69 dB.
Figure 10
 
CC (correlation coefficient) and SROCC (Spearman rank-order correlation coefficient) between patch features and thresholds. CC measures how well the patch features correlate with the thresholds, and SROCC measures the relative monotonicity between the patch features and the thresholds. (a) CC and SROCC of patch features without dividing the patches into groups. The features were calculated for the patches with average luminances greater than 3 cd/m2. (b) SROCC of the features for low-, medium-, and high-threshold patches. For each group 200 patches were selected. The threshold ranges were: low threshold: −59.87 to −43.91 dB; medium threshold: −40.84 to −30.78 dB; and high threshold: −28.38 dB to −13.69 dB.
Figure 11
 
Scatter plots of detection thresholds versus patch features. (a) Michelson contrast versus thresholds, (b) RMS contrast versus thresholds, (c) edge density versus thresholds, (d) intercept of magnitude spectrum versus thresholds, (e) sharpness versus thresholds, and (f) band energy versus thresholds. Descriptions of these features are given in Feature-based analysis. The scatter plots shown here include thresholds and features of all 1,080 natural image patches of our database.
Figure 11
 
Scatter plots of detection thresholds versus patch features. (a) Michelson contrast versus thresholds, (b) RMS contrast versus thresholds, (c) edge density versus thresholds, (d) intercept of magnitude spectrum versus thresholds, (e) sharpness versus thresholds, and (f) band energy versus thresholds. Descriptions of these features are given in Feature-based analysis. The scatter plots shown here include thresholds and features of all 1,080 natural image patches of our database.
Figure 12
 
Model masking maps with four sets of parameters. Set1: Watson and Solomon's (1997) observer KMF parameters (Foley, 1994): p = 2.3, q = 2.0, b = 0.02. Set2: Watson and Solomon's (1997) observer JYS parameters (Foley, 1994): p = 2.3, q = 2.0, b = 0.08. Set3: Teo and Heeger's (1994) model parameters: p = 2.0, q = 2.0, b = 0.05. Set4: Chandler et al.'s (2009) texture-class parameters: p = 2.4, q = 2.32, b = 0.035. The ground-truth masking maps are shown after the mask images. For display purposes, the thresholds above 0 dB were set to 0 dB.
Figure 12
 
Model masking maps with four sets of parameters. Set1: Watson and Solomon's (1997) observer KMF parameters (Foley, 1994): p = 2.3, q = 2.0, b = 0.02. Set2: Watson and Solomon's (1997) observer JYS parameters (Foley, 1994): p = 2.3, q = 2.0, b = 0.08. Set3: Teo and Heeger's (1994) model parameters: p = 2.0, q = 2.0, b = 0.05. Set4: Chandler et al.'s (2009) texture-class parameters: p = 2.4, q = 2.32, b = 0.035. The ground-truth masking maps are shown after the mask images. For display purposes, the thresholds above 0 dB were set to 0 dB.
Figure 13
 
(a) CC, SROCC, and (b) RMSE between the thresholds and model predictions with four sets of fixed parameters. The CC, SROCC, and RMSE shown here are the average CC, SROCC, and RMSE of all mask images. The error bars are given in terms of standard deviations. The CC, SROCC, and RMSE values for individual mask images are provided in Table 5 of Appendix C.
Figure 13
 
(a) CC, SROCC, and (b) RMSE between the thresholds and model predictions with four sets of fixed parameters. The CC, SROCC, and RMSE shown here are the average CC, SROCC, and RMSE of all mask images. The error bars are given in terms of standard deviations. The CC, SROCC, and RMSE values for individual mask images are provided in Table 5 of Appendix C.
Figure 14
 
Cases where the masking-model performed poorly. The first row shows the mask images, the second row shows the ground-truth masking maps, and the third row shows the model masking maps. Model maps are drawn using the Set1 parameters described in the Performance of the model section.
Figure 14
 
Cases where the masking-model performed poorly. The first row shows the mask images, the second row shows the ground-truth masking maps, and the third row shows the model masking maps. Model maps are drawn using the Set1 parameters described in the Performance of the model section.
Figure 15
 
Patches were sorted according to the absolute differences between the model predictions and the thresholds. (a) Shows the example of patches where the absolute differences were the smallest. (b) Shows the example of patches where the absolute differences were the largest. Here, only the patches with average luminance greater than 3 cd/m2 are shown. Model predictions were calculated using the Set1 parameters described in the Performance of the model section.
Figure 15
 
Patches were sorted according to the absolute differences between the model predictions and the thresholds. (a) Shows the example of patches where the absolute differences were the smallest. (b) Shows the example of patches where the absolute differences were the largest. Here, only the patches with average luminance greater than 3 cd/m2 are shown. Model predictions were calculated using the Set1 parameters described in the Performance of the model section.
Figure 16
 
Example patches where the model predictions were poor. The patches are shown with contexts. From top to bottom the absolute error between the detection thresholds and model predictions decreases. The absolute error ranges are, top row: 12.6 dB to 10.8 dB, second row from the top: 10.7 dB to 9.1 dB, third row from the top: 8.7 dB to 8.2 dB, and bottom row: 8.1 dB to 7.8 dB. Furthermore, in each row from left to right the absolute error decreases. Model predictions were calculated using the Set1 parameters described in the Performance of the model section.
Figure 16
 
Example patches where the model predictions were poor. The patches are shown with contexts. From top to bottom the absolute error between the detection thresholds and model predictions decreases. The absolute error ranges are, top row: 12.6 dB to 10.8 dB, second row from the top: 10.7 dB to 9.1 dB, third row from the top: 8.7 dB to 8.2 dB, and bottom row: 8.1 dB to 7.8 dB. Furthermore, in each row from left to right the absolute error decreases. Model predictions were calculated using the Set1 parameters described in the Performance of the model section.
Figure 17
 
Effects of gain control pooling width. The RMSEs between the thresholds and model predictions were calculated by varying (a) space pooling width (in degrees of subtended angle), (b) spatial frequency pooling width (octave), and (c) orientation pooling width (in degrees). While varying only the space pooling width (for [a]), the frequency and orientation pooling widths were 0:7 octave and 60°, respectively. While varying the frequency pooling width (for [b]), the space and orientation pooling widths were approximately 0:03° and 60°, respectively. While varying orientation pooling width (for [c]), the space and frequency pooling widths were approximately 0:03° and 0:7 octave, respectively. Note that in (b) and (c), by 0 octave and 0° we denote that the gain pool only contained the response from excitatory frequency and excitatory orientation channel, respectively. Furthermore, note that the vertical axes for all three plots are in logarithmic scale. The description of all four sets (shown in the legend in [a]) are given in Performance of the model. To change the pooling width, we used convolution with kernels of varying lengths with equal weights in each dimension.
Figure 17
 
Effects of gain control pooling width. The RMSEs between the thresholds and model predictions were calculated by varying (a) space pooling width (in degrees of subtended angle), (b) spatial frequency pooling width (octave), and (c) orientation pooling width (in degrees). While varying only the space pooling width (for [a]), the frequency and orientation pooling widths were 0:7 octave and 60°, respectively. While varying the frequency pooling width (for [b]), the space and orientation pooling widths were approximately 0:03° and 60°, respectively. While varying orientation pooling width (for [c]), the space and frequency pooling widths were approximately 0:03° and 0:7 octave, respectively. Note that in (b) and (c), by 0 octave and 0° we denote that the gain pool only contained the response from excitatory frequency and excitatory orientation channel, respectively. Furthermore, note that the vertical axes for all three plots are in logarithmic scale. The description of all four sets (shown in the legend in [a]) are given in Performance of the model. To change the pooling width, we used convolution with kernels of varying lengths with equal weights in each dimension.
Figure 18
 
Thresholds versus patch categories (Chandler et al., 2009). We selected 18 patches of five categories: blank, edge, structure, texture, and dark. The average thresholds along with the standard deviations of the selected patches are shown. Despite some variations, there is an increasing trend of thresholds from the blank category to dark category.
Figure 18
 
Thresholds versus patch categories (Chandler et al., 2009). We selected 18 patches of five categories: blank, edge, structure, texture, and dark. The average thresholds along with the standard deviations of the selected patches are shown. Despite some variations, there is an increasing trend of thresholds from the blank category to dark category.
Table 1
 
Distribution of the individual subjects as Subject1, Subject2, and Subject3 for the 30 mask images. The first row shows the names of the subjects. The second to third rows show the distribution of individual subjects. The last row shows the total number of images viewed by the individual subjects.
Table 1
 
Distribution of the individual subjects as Subject1, Subject2, and Subject3 for the 30 mask images. The first row shows the names of the subjects. The second to third rows show the distribution of individual subjects. The last row shows the total number of images viewed by the individual subjects.
MA JE YZ PS PV KV TN AR TP Total
Subject1 30 - - - - - - - - 30
Subject2 - 15 8 3 1 1 1 - 1 30
Subject3 - - 4 6 5 5 4 5 1 30
Total 30 15 12 9 6 6 5 5 2
Table 2
 
Parameters of the model. The first column shows the symbols of the parameters. The second column shows the descriptions of the parameters. The third column shows the ranges of values of the parameters that we used in our implementation of the model.
Table 2
 
Parameters of the model. The first column shows the symbols of the parameters. The second column shows the descriptions of the parameters. The third column shows the ranges of values of the parameters that we used in our implementation of the model.
Symbol Description Value
Contrast sensitivity filter
fCSF CSF peak frequency 6.0 c/°
BWCSF CSF log10 bandwidth 1.43
Log-Gabor neural array
nf Number of frequency bands 8
BWf Bandwidth of the frequency bands 1.5 octaves
f0G Center radial frequencies of the bands 0.44, 0.74, 1.22, 2.09
3.57, 6.22, 10.84, 16.84 c/°
nθ Number of orientation channels 6
BWθ Bandwidth of the orientation channels 30°
θ0G Center angles of the orientation channels 0°, ±30°, ±60°, 90°
Divisive gain control
p Excitatory exponent 2.0, 2.3, 2.4
q Inhibitory exponent 2.0, 2.32
b Semi-saturation constant 0.02, 0.035, 0.05, 0.08
g Output gain 0.1
sx,y Pooling kernel in space 3 × 3 neighborhood with Gaussian (standard deviation 0.5, around 0.03°)
sf Pooling kernel in frequency Within ±0.7 octave bandwidth with equal weights
sθ Pooling kernel in orientation Within ±60° bandwidth with equal weights
βr Minkowski exponent for space 2.0
βf Minkowski exponent for frequency 1.5
βθ Minkowski exponent for orientation 1.5
d Constant decision 1.0
Table 3
 
Intrasubject and intersubject consistencies for all 30 mask images in terms of CC, SROCC, and RMSE. “Sub1,” “Sub2,” and “Sub3” denote the intrasubject consistencies for Subject1, Subject2, and Subject3. “S1&2,” “S2&3,” and “S3&1” denote the intersubject consistency values. The average and overall consistency values are shown at the last two columns of the bottom-right corner of the table.
Table 3
 
Intrasubject and intersubject consistencies for all 30 mask images in terms of CC, SROCC, and RMSE. “Sub1,” “Sub2,” and “Sub3” denote the intrasubject consistencies for Subject1, Subject2, and Subject3. “S1&2,” “S2&3,” and “S3&1” denote the intersubject consistency values. The average and overall consistency values are shown at the last two columns of the bottom-right corner of the table.
log_ seaside sunsetcolor lake redwood snow_ leaves swarm elk foxy child_ swimming native_ american couple roping woman
Intrasubject consistency
 CC Sub1 0.95 0.98 0.95 0.97 0.97 0.97 0.9 0.93 0.97 0.86 0.86 0.95 0.98
Sub2 0.88 0.97 0.88 0.95 0.95 0.98 0.87 0.82 0.99 0.9 0.94 0.92 0.96
Sub3 0.97 0.99 0.92 0.93 0.93 0.97 0.93 0.89 0.91 0.96 0.87 0.97 0.92
 SROCC Sub1 0.95 0.88 0.94 0.96 0.95 0.91 0.89 0.9 0.91 0.84 0.83 0.92 0.95
Sub2 0.89 0.81 0.82 0.91 0.86 0.92 0.78 0.82 0.97 0.88 0.93 0.85 0.94
Sub3 0.97 0.95 0.85 0.88 0.81 0.81 0.9 0.83 0.89 0.94 0.86 0.94 0.9
 RMSE Sub1 2.5 2.37 2.67 2.04 2.68 1.86 3.62 2.66 3.1 2.8 4.14 2.48 1.72
Sub2 4.99 2.94 3.88 2.77 3.26 1.78 4.87 3 1.68 3.03 2.52 2.79 2.3
Sub3 2.54 1.5 3.74 3.86 2.8 2.49 3.7 2.89 3.82 2.08 3.92 2.61 3.69
Intersubject consistency
 CC S1&2 0.87 0.98 0.88 0.94 0.84 0.97 0.85 0.85 0.95 0.81 0.88 0.93 0.94
S2&3 0.92 0.97 0.86 0.93 0.92 0.98 0.94 0.81 0.95 0.89 0.87 0.9 0.96
S3&1 0.9 0.98 0.93 0.95 0.94 0.97 0.85 0.83 0.92 0.85 0.87 0.92 0.94
 SROCC S1&2 0.86 0.91 0.82 0.89 0.61 0.83 0.81 0.8 0.89 0.77 0.87 0.89 0.89
S2&3 0.92 0.82 0.77 0.89 0.82 0.85 0.87 0.77 0.93 0.83 0.84 0.84 0.93
S3&1 0.88 0.87 0.88 0.92 0.77 0.78 0.79 0.76 0.9 0.86 0.86 0.88 0.89
 RMSE S1&2 5.3 1.98 4.46 4.92 4.81 2.83 4.5 4.92 3.91 3.86 5.95 4.37 3.44
S2&3 4.02 2.51 3.76 3.26 3.21 1.87 3.14 3.79 2.85 2.95 4.15 4.75 3.33
S3&1 6.58 2.32 3.43 3.91 4.92 3.94 4.2 3.94 4.71 3.42 4.3 3.06 5.21
Table 3
 
Extended.
Table 3
 
Extended.
fisher butter_ flower shroom cactus sunset_ sparrow family veggies turtle geckos bridge i1600 rushmore lady_ liberty monu- ment boston aerial_ city trolley Average Overall
Intrasubject consistency
0.89 0.98 0.93 0.86 0.97 0.98 0.97 0.96 0.89 0.92 0.96 0.97 0.88 0.87 0.92 0.94 0.92 0.93 0.95
0.94 0.99 0.94 0.96 0.96 0.97 0.95 0.92 0.92 0.97 0.99 0.98 0.94 0.92 0.95 0.94 0.91 0.94 0.95
0.91 0.96 0.93 0.94 0.96 0.97 0.98 0.94 0.9 0.95 0.97 0.9 0.87 0.77 0.89 0.98 0.95 0.93 0.95
0.89 0.96 0.9 0.87 0.95 0.96 0.87 0.93 0.85 0.85 0.91 0.96 0.89 0.88 0.88 0.95 0.91 0.91 0.95
0.95 0.97 0.94 0.94 0.95 0.96 0.9 0.83 0.81 0.95 0.97 0.98 0.91 0.88 0.95 0.93 0.91 0.9 0.95
0.89 0.94 0.91 0.94 0.94 0.96 0.92 0.91 0.77 0.94 0.89 0.87 0.85 0.7 0.89 0.97 0.96 0.89 0.94
2.51 2.34 2.31 4.37 1.9 1.91 2.65 2.64 3.63 3.45 2.55 2.05 2.38 2.78 3.04 2.7 2.79 2.69 2.76
3.18 1.19 2.09 3.19 2.08 2.34 2.83 3.63 2.81 2.23 1.65 2.38 2.52 2.62 2.27 2.13 2.94 2.73 2.85
3.34 2.94 2.81 2.83 2.02 2.12 1.95 3.19 4.37 2.81 2.95 4.36 3.06 4.23 3.34 1.51 2.83 3.01 3.11
Intersubject consistency
0.93 0.95 0.91 0.95 0.91 0.97 0.91 0.95 0.85 0.92 0.96 0.94 0.92 0.75 0.95 0.9 0.94 0.91 0.93
0.92 0.96 0.86 0.95 0.96 0.96 0.88 0.93 0.9 0.89 0.97 0.97 0.92 0.86 0.96 0.86 0.96 0.92 0.92
0.92 0.97 0.87 0.95 0.91 0.98 0.97 0.94 0.92 0.96 0.96 0.92 0.86 0.72 0.89 0.93 0.92 0.91 0.92
0.94 0.95 0.9 0.93 0.92 0.97 0.8 0.89 0.8 0.89 0.91 0.93 0.92 0.7 0.9 0.88 0.93 0.87 0.92
0.91 0.95 0.81 0.91 0.95 0.96 0.7 0.88 0.81 0.84 0.93 0.94 0.91 0.8 0.95 0.85 0.96 0.87 0.91
0.92 0.96 0.85 0.96 0.91 0.98 0.87 0.9 0.86 0.93 0.91 0.92 0.81 0.69 0.84 0.92 0.92 0.87 0.92
4.49 3.37 4.4 5.01 3.7 5.27 4.75 3.81 5.93 4.81 4.63 6.61 4.47 6.93 4.41 3.62 2.56 4.47 4.59
3.18 3.68 3.75 4.02 3.19 4.96 4.61 4.14 2.75 4.66 3.47 3.1 2.48 3.21 1.93 4.07 3.83 3.49 3.57
6.34 4.8 3.63 3.25 5.79 2.14 5.01 3.3 5.3 2.61 3.07 5.57 5.3 6.13 4.07 2.76 5.16 4.27 4.44
Table 4
 
Results of the statistical ANOVA tests performed using the generalized linear model procedure in SAS9.3 software package with 95% confidence level. In the GLM procedure, the dependent variable was threshold and the independent variables were patch, subject, and run. Furthermore, the subject-run interactions over the thresholds were measured. In the first and thirteenth row of the table the mask image names are shown. The p values close to or greater than 0.05 are shown in bold fonts. The p values greater than 0.0001 and quite less than 0.05 are shown in italic letters.
Table 4
 
Results of the statistical ANOVA tests performed using the generalized linear model procedure in SAS9.3 software package with 95% confidence level. In the GLM procedure, the dependent variable was threshold and the independent variables were patch, subject, and run. Furthermore, the subject-run interactions over the thresholds were measured. In the first and thirteenth row of the table the mask image names are shown. The p values close to or greater than 0.05 are shown in bold fonts. The p values greater than 0.0001 and quite less than 0.05 are shown in italic letters.
log_ seaside sunsetcolor lake redwood snow_ leaves swarm elk foxy child_ swimming native_ american couple roping woman fisher
Model
 F 12.21 24.26 3.59 7.51 2.5 4.84 9.09 2.71 23.18 5.67 3.17 3.01 5.3 8.24
p value < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001
R Square 0.74 0.85 0.45 0.63 0.36 0.53 0.68 0.38 0.84 0.56 0.42 0.41 0.55 0.65
Patch
F 10.59 26.64 3.04 5.08 1.33 1.59 8.46 2.41 25.13 3.84 2.4 1.2 1.95 6.82
p value < 0.0001 < 0.0001 < 0.0001 < 0.0001 0.12 0.03 < 0.0001 < 0.0001 < 0.0001 < 0.0001 1.00E-04 0.22 0.003 < 0.0001
Subject
F 48.03 15.34 2.73 39.42 9.82 51.24 4.19 4.74 11.96 13.69 11.05 23.51 32.67 19.43
p value < 0.0001 < 0.0001 0.07 < 0.0001 < 0.0001 < 0.0001 0.02 0.01 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001
Run
F 9.24 1.9 28.07 2.12 5.44 16.98 2.45 9.34 22.45 15.14 11.14 4.2 32.28 10.4
p value 0.003 0.17 < 0.0001 0.15 0.02 < 0.0001 0.12 0.003 < 0.0001 1.00E-04 0.001 0.04 <0.0001 0.002
Subject-Run
F 6.24 2.82 1.82 20.83 14.18 9.39 28.26 2.58 0.66 24.94 4.88 13.57 22.97 20.92
p value 0.002 0.06 0.16 < 0.0001 < 0.0001 1.00E-04 < 0.0001 0.08 0.52 < 0.0001 0.009 < 0.0001 < 0.0001 < 0.0001
Table 4
 
Extended.
Table 4
 
Extended.
butter_ flower shroom cactus sunset_ sparrow family veggies turtle geckos bridge 1600 rushmore lady_ liberty monument boston aerial_ city trolley
8.35 4.58 14.35 5.83 10.42 3.66 4.03 1.78 11.1 13.99 16.06 4.92 2.64 6.63 6.18 3.03
< 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 0.006 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001
0.66 0.51 0.77 0.57 0.7 0.46 0.48 0.29 0.72 0.76 0.79 0.53 0.38 0.6 0.59 0.41
4.88 4.76 10.69 5.27 2.66 2.41 3.22 1.82 3.28 15.82 17.55 2.32 2.36 5.19 6.53 3.25
< 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 0.007 < 0.0001 < 0.0001 < 0.0001 2.00E-04 1.00E-04 < 0.0001 < 0.0001 < 0.0001
49.95 2.8 90.79 0.57 159.18 21.9 19.14 1.08 93.08 0.51 6.44 30.74 10.15 17.99 7.14 1.94
< 0.0001 0.06 < 0.0001 0.57 < 0.0001 < 0.0001 < 0.0001 0.34 < 0.0001 0.6 0.002 < 0.0001 < 0.0001 < 0.0001 0.001 0.15
4.35 3.7 1.54 8.39 2.56 0.71 4.62 3.88 17.9 3.1 1.27 0.83 0.97 0.22 1.96 0.01
0.04 0.06 0.22 0.004 0.11 0.4 0.03 0.05 < 0.0001 0.08 0.26 0.36 0.33 0.64 0.16 0.94
32.42 3.67 8.34 19.63 1.29 8.8 2.84 0.82 62.59 0.89 6.87 26.67 0.88 23.6 1.19 1.82
< 0.0001 0.03 3.00E-04 < 0.0001 0.28 2.00E-04 0.06 0.44 < 0.0001 0.41 0.001 < 0.0001 0.42 < 0.0001 0.31 0.16
Table 5
 
CC, SROCC, and RMSE between thresholds and model predictions for individual mask images. Set1: Watson and Solomon's (1997) observer KMF (Foley, 1994) parameters: p = 2.3, q = 2.0, b = 0.02. Set2: Watson and Solomon's (1997) observer JYS (Foley, 1994) parameters: p = 2.3, q = 2.0, b = 0.08. Set3: Teo and Heeger's (1994) model parameters: p = 2.0, q = 2.0, b = 0.05. Set4: Chandler et al.'s (2009) texture-class parameters: p = 2.4, q = 2.32, b = 0.035. The best CC, SROCC, and the RMSE values of each of the mask images are shown in bold font.
Table 5
 
CC, SROCC, and RMSE between thresholds and model predictions for individual mask images. Set1: Watson and Solomon's (1997) observer KMF (Foley, 1994) parameters: p = 2.3, q = 2.0, b = 0.02. Set2: Watson and Solomon's (1997) observer JYS (Foley, 1994) parameters: p = 2.3, q = 2.0, b = 0.08. Set3: Teo and Heeger's (1994) model parameters: p = 2.0, q = 2.0, b = 0.05. Set4: Chandler et al.'s (2009) texture-class parameters: p = 2.4, q = 2.32, b = 0.035. The best CC, SROCC, and the RMSE values of each of the mask images are shown in bold font.
log_ seaside sunsetcolor lake redwood snow_ leaves swarm elk foxy child_ swimming native_ american couple roping woman fisher
Set1
 CC 0.86 0.95 0.6 0.82 0.81 0.89 0.72 0.53 0.84 0.66 0.49 0.67 0.86 0.8
 SROCC 0.78 0.72 0.55 0.84 0.81 0.71 0.68 0.43 0.75 0.65 0.49 0.58 0.77 0.73
 RMSE 4.27 8.86 5.3 4.33 3.48 5.41 5.33 4.28 5.52 4.74 5.75 8.37 5 3.74
Set2
 CC 0.86 0.93 0.54 0.8 0.75 0.86 0.71 0.5 0.84 0.68 0.41 0.76 0.87 0.79
 SROCC 0.77 0.61 0.5 0.83 0.75 0.65 0.67 0.36 0.75 0.64 0.4 0.71 0.81 0.72
 RMSE 4.21 12.78 5.55 5.14 4.47 6.82 5.43 4.61 5.56 5 5.85 10.54 5.55 3.61
Set3
 CC 0.87 0.95 0.6 0.81 0.77 0.94 0.71 0.76 0.89 0.62 0.66 0.65 0.84 0.77
 SROCC 0.8 0.72 0.62 0.84 0.76 0.76 0.68 0.58 0.81 0.62 0.64 0.54 0.74 0.73
 RMSE 5.35 8.88 6.63 5.76 6.92 3.84 6.04 3.78 5.29 5.26 4.95 9.2 4.4 6.88
Set4
 CC 0.86 0.96 0.59 0.81 0.78 0.92 0.7 0.75 0.86 0.61 0.63 0.65 0.84 0.77
 SROCC 0.79 0.72 0.61 0.85 0.76 0.72 0.68 0.69 0.78 0.61 0.61 0.54 0.75 0.71
 RMSE 5.37 8.86 6.67 5.62 6.5 3.73 6.08 3.81 5.67 5.29 5.21 9.15 4.33 6.49
Table 5
 
Extended.
Table 5
 
Extended.
butter_ flower shroom cactus sunset_ sparrow family veggies turtle geckos bridge i1600 rushmore lady_ liberty monument boston aerial_ city trolley
0.78 0.75 0.95 0.75 0.85 0.66 0.85 0.69 0.91 0.94 0.66 0.92 0.71 0.76 0.58 0.75
0.7 0.73 0.94 0.72 0.85 0.66 0.77 0.49 0.83 0.75 0.6 0.92 0.58 0.72 0.55 0.72
9.36 6.16 3.24 4.44 7.63 5 4.74 5.41 5.14 4.32 7.25 2.21 4.28 5.13 7.33 5.15
0.76 0.75 0.95 0.8 0.87 0.74 0.86 0.68 0.9 0.94 0.6 0.9 0.73 0.76 0.56 0.73
0.71 0.69 0.95 0.72 0.86 0.66 0.72 0.49 0.8 0.78 0.4 0.9 0.57 0.71 0.32 0.71
11.4 7.31 4.17 5.37 10.12 5.23 5.49 5.55 5.98 3.97 9.11 2.25 5.03 5.22 8.51 5.34
0.76 0.79 0.94 0.81 0.87 0.65 0.87 0.78 0.9 0.94 0.82 0.93 0.69 0.81 0.64 0.79
0.7 0.78 0.92 0.78 0.85 0.62 0.77 0.69 0.8 0.73 0.85 0.91 0.56 0.77 0.59 0.76
10.51 4.1 4.08 4.74 6.8 6.97 5.41 4.33 5.12 6.33 5.17 5.39 6.2 4.24 4.97 4.03
0.76 0.78 0.94 0.81 0.86 0.63 0.86 0.75 0.9 0.94 0.8 0.93 0.69 0.81 0.62 0.79
0.74 0.77 0.92 0.78 0.85 0.65 0.78 0.61 0.8 0.73 0.79 0.91 0.55 0.77 0.58 0.75
10.42 4.16 3.74 4.56 6.61 6.75 5.2 4.84 4.94 6.05 5.54 5.07 6.06 4.17 5.13 4.13
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×