**Current top-performing blind perceptual image quality prediction models are generally trained on legacy databases of human quality opinion scores on synthetically distorted images. Therefore, they learn image features that effectively predict human visual quality judgments of inauthentic and usually isolated (single) distortions. However, real-world images usually contain complex composite mixtures of multiple distortions. We study the perceptually relevant natural scene statistics of such authentically distorted images in different color spaces and transform domains. We propose a “bag of feature maps” approach that avoids assumptions about the type of distortion(s) contained in an image and instead focuses on capturing consistencies—or departures therefrom—of the statistics of real-world images. Using a large database of authentically distorted images, human opinions of them, and bags of features computed on them, we train a regressor to conduct image quality prediction. We demonstrate the competence of the features toward improving automatic perceptual quality prediction by testing a learned algorithm using them on a benchmark legacy database as well as on a newly introduced distortion-realistic resource called the LIVE In the Wild Image Quality Challenge Database. We extensively evaluate the perceptual quality prediction model and algorithm and show that it is able to achieve good-quality prediction power that is better than other leading models.**

*pristine*images) that have been suitably normalized follow statistical laws. Current NR IQA models measure perturbations of these statistics to predict image distortions. State-of-the-art NSS-based NR IQA models (Mittal, Moorthy et al., 2012; Mittal, Soundararajan, & Bovik, 2012; Moorthy & Bovik, 2010, 2011; Saad et al., 2012; Tang et al., 2011; Y. Zhang, Moorthy, Chandler, & Bovik, 2014) exploit these statistical perturbations by first extracting image features in a normalized bandpass space and then learning a kernel function that maps these features to ground truth subjective quality scores. To date, these feature representations have been tested only on images containing synthetically applied distortions and may not perform well when applied on real-world images afflicted by mixtures of authentic distortions (Table 2).

*α*) and variance (

*σ*

^{2}; see later sections for more details). We found that the value of

*α*for Figure 2a is 2.09, in accordance with the Gaussian model of the histogram of its NLC map. It should be noted that the family of GGDs includes the normal distribution when

*α*= 2 and the Laplacian distribution when

*α*= 1. This property is not specific to Figure 2a but rather is generally characteristic of all natural images. As first observed in Ruderman (1994) natural, undistorted images of quite general (well-lit) image content captured by any good-quality camera may be expected to exhibit this statistical regularity after processing by applying bandpass debiasing and divisive normalization operations. To further illustrate this well-studied phenomenal regularity, we processed 29 pristine images from the legacy LIVE IQA Database (Sheikh et al., 2006) that vary greatly in their image content and plotted the collective histogram of the normalized coefficients of all 29 images in Figure 1. Specifically, we concatenated the normalized coefficients of all the images into a single vector and plotted its histogram. The best-fitting GGD model yielded

*α*= 2.15, which is again nearly Gaussian. The singular spike at zero almost invariably arises from a cloudless sky entirely bereft of objects.

*α*) is 2.12 despite the presence of multiple severe and interacting distortions. As a way of visualizing this problem, we show scatter plots of subjective quality scores against the

*α*values of the best GGD fits to NLC maps of all the images (including the pristine images) in the legacy LIVE Database (of synthetically distorted pictures; Sheikh et al., 2006) in Figure 5a and of all the authentically distorted images in the LIVE Challenge Database in Figure 5b. From Figure 5a, it can be seen that most of the images in the legacy LIVE Database that have high human subjective quality scores (i.e., low difference of mean opinion scores [DMOS]) associated with them (including the pristine images) have estimated

*α*values close to 2.0, whereas pictures having low quality scores (i.e., high DMOS) take different

*α*values and thus are statistically distinguishable from high-quality images. However, Figure 5b shows that authentically distorted images from the new LIVE Challenge Database may be associated with

*α*values close to 2.0, even on heavily distorted pictures (i.e., with low mean opinion scores [MOS]). Figure 4 plots the distribution of the fraction of all the images in the database that fall into four discrete MOS and DMOS categories. It should be noted that the legacy LIVE IQA Database provides DMOS scores, whereas the LIVE Challenge Database contains MOS scores. These histograms show that the distorted images span the entire quality range in both databases and that there is no noticeable skew of distortion severity in either databases that could have affected the results in Figures 5 and 6.

^{1}

*L*of size

*M*×

*N*, a divisive normalization operation (Ruderman, 1994) yields an NLC map: where and where

*i*∈ 1, 2, . . . ,

*M*and

*j*∈ 1, 2, . . . ,

*N*are spatial indices and

*w*= {

*w*

_{k}_{,}

*|*

_{l}*k*= −3, . . . , 3,

*l*= −3, . . . , 3} is a two-dimensional circularly symmetric Gaussian weighting function.

*α*controls the “shape” of the distribution and

*σ*

^{2}controls its variance. A zero mean distribution is appropriate for modeling NLC distributions because they are (generally) symmetric. These parameters are commonly estimated using an efficient moment matching–based approach (Mittal, Moorthy et al., 2012; Sharifi & Leon-Garcia, 1995).

*η*is given by

*ν*controls the “shape” of the distribution,

*η*is the mean of the distribution, and

*M*×

*N*× 3 image

*I*in red-green-blue (RGB) color space, its luminance component is first extracted, which we refer to as the

*Luma*map. An NLC map as defined in Equation 1 is then computed on it by applying a divisive normalization operation on it (Ruderman, 1994). A slight variation from the usual retinal contrast signal model is the use of divisive normalization by the standard deviation (as defined in Equation 3) of the local responses rather than by the local mean response. The best-fitting GGD model to the empirical distribution of the NLC map is found (Mittal, Moorthy et al., 2012). Two parameters (

*α*,

*σ*

^{2}) are estimated and two sample statistics (kurtosis, skewness) are computed from the empirical distribution over two scales, yielding a total of eight features. The features may be regarded as essential NSS features related to classical models of retinal processing.

*i, j*) by taking the product of

*NLC*(

*i, j*) with each of its directional neighbors:

*NLC*(

*i, j*+ 1),

*NLC*(

*i*+ 1,

*j*),

*NLC*(

*i*+ 1,

*j*+ 1), and

*NLC*(

*i*+ 1,

*j*–1). These maps have been shown to reliably obey an AGGD in the absence of distortion (Mittal, Moorthy et al., 2012). A total of 24 parameters—four AGGD parameters per product map and two sample statistics (kurtosis, skewness)—are computed. These features are computed on two scales, yielding 48 additional features. These features use the same NSS/retinal model to account for local spatial correlations.

*σ*

_{2}= 1.5

*σ*

_{1}. The value of

*σ*

_{1}in our implementation was 1.16. The mean subtracted and divisively normalized coefficients of the DoG of the sigma field (obtained by applying Equation 3 on the DoG of the sigma field; denoted henceforth as DoG

_{sigma}) of the luminance map of a pristine image exhibits a regular structure that deviates in the presence of some kinds of distortion (Figure 11a). Features that are useful for capturing a broad spectrum of distortion behavior include the estimated shape, standard deviation, sample skewness, and kurtosis. The DoG of the sigma field can highlight conspicuous stand-out statistical features that may particularly affect the visibility of distortions.

_{sigma}and denote its mean subtracted and divisively normalized coefficients as DoG′

_{sigma}. The sigma field of DoG

_{sigma}is obtained by applying Equation 3 on DoG

_{sigma}. We found that DoG′

_{sigma}also exhibits statistical regularities disrupted by the presence of distortions (Figure 11b). The sample kurtosis and skewness of these normalized coefficients are part of the list of features that are fed to the regressor.

*Luma*) and model it using an AGGD. This is also a bandpass retinal NSS model but without normalization. The estimated model parameters

*Chroma*map defined in the perceptually relevant CIELAB color space of one luminance (L*) and two chrominance (a* and b*) components (Rajashekar, Wang, & Simoncelli, 2010). The coordinate L* of the CIELAB space represents color lightness, a* is its position relative to red/magenta and green, and b* is its position relative to yellow and blue. Moreover, the nonlinear relationships between L*, a*, and b* mimic the nonlinear responses of the L-, M-, and S-cone cells in the retina and are designed to uniformly quantify perceptual color differences.

*Chroma*, on the other hand, captures the perceived intensity of a specific color and is defined as follows: where

*a*

^{*}and

*b*

^{*}refer to the two chrominance components of any given image in the LAB color space. The chrominance channels contained in the chroma map are entropy-reduced representations similar to the responses of color-differencing retinal ganglion cells.

*Chroma*map (Equation 12) of a pristine image follow a Gaussian-like distribution, which is perturbed by the presence of distortions (Figure 12a); thus, a GGD model is apt to capture these statistical deviations. We extract two model parameters (shape and standard deviation) and two sample statistics (kurtosis and skewness) at two scales to serve as image features.

*Chroma*(henceforth referred to as

*Chroma*). The mean subtracted and divisively normalized coefficients of

_{sigma}*Chroma*of pristine images also obey a unit Gaussian-like distribution, which is violated in the presence of distortions (Figure 12b). We again use a GGD to model these statistical deviations, estimate the model parameters (shape and standard deviation), and compute the sample kurtosis and skewness at two scales. All of these are used as features deployed by the learner.

_{sigma}*Chroma*. We also process the normalized coefficients of the

_{sigma}*Chroma*map and generate four neighboring pair product maps, the Laplacian, DOG

_{sigma}, and DOG′

_{sigma}maps, and extract the model parameters and sample statistics from them. C-DIIVINE features on the

*Chroma*map of each image are also extracted to be used later by the learner.

*l̂*) and two chromatic-opponent (RG and BY) responses.

*L̂*,

*M̂*, and

*Ŝ*are the NLCs (Equation 1) of the logarithmic signals of the L, M, and S components, respectively; that is, where

*μ*(

_{L}*i, j*) is the mean and

*σ*(

_{L}*i, j*) is the standard deviation of

*log L*, similar to those defined in Equations 2 and 3 for

*L. M̂*(

*i*,

*j*) and

*Ŝ*(

*i*,

*j*) are defined in the same manner as Equation 16 from log

*M*(

*i*,

*j*) and log

*S*(

*i*,

*j*), respectively.

_{sigma}, and DOG′

_{sigma}feature maps from both M and S channels and extract model parameters and sample statistics from them. C-DIIVINE features at three scales and six orientations are also computed on both the channel maps and added to the final list of features.

*I*, which is defined as follows: where

*R*,

*G*, and

*B*refer to the red, green, and blue channels, respectively. Our motivation for using the yellow channel is simply to provide the learner with direct yellow-light information rather than just BY color opponency, which might be relevant to distortion perception, especially on sunlit scenes.

_{sigma}) also display Gaussian behavior on pristine images (Figure 14b). This behavior is often not observed on distorted images. Thus, the goodness of generalized Gaussian fit of both the normalized coefficients of Y and Y

_{sigma}at the original scale of the image are also extracted and added as features used in our model. As discussed in the next section, features drawn from the yellow color channel map were able to efficiently capture a few distortions that were not captured by the luminance component alone.

_{sigma}map on the luminance map as defined in earlier sections. It may be observed that although the histograms of the singly distorted images differ greatly from those of the pristine image in Figure 15a, the distribution of an authentically distorted image containing noise, blur, and compression artifacts closely resembles the distribution of the pristine image. However, when the normalized coefficients of the proposed yellow color channel and the DoG

_{sigma}feature maps are observed in Figure 15b and c, it is clear that these distributions are useful for distinguishing between the pristine image and both singly and authentically distorted images. We have observed the usefulness of all of the proposed feature maps on a large and comprehensive collection of images contained in the LIVE Challenge database.

*t*test (Sheskin, 2004) on the 50 SROCC values obtained from the 50 train–test trials. The results are tabulated in Table 3. The null hypothesis is that the mean of the two paired samples is equal—that is, the mean correlation for the (row) algorithm is equal to the mean correlation for the (column) algorithm with a confidence of 95%. The alternative hypothesis is that the mean correlation of the row algorithm is greater than or lesser than the mean correlation of the column algorithm. A value of 1 in the table indicates that the row algorithm is statically superior to the column algorithm, whereas a value of −1 indicates that the row is statistically worse than the column. A value of 0 indicates that the row and column are statistically indistinguishable (or equivalent); that is, we could not reject the null hypothesis at the 95% confidence level. From Table 3 we conclude that FRIQUEE-ALL is statistically superior to all of the NR algorithms that we evaluated when trained and tested on the LIVE Challenge Database.

*, 6059, 605905–1 –605905-10.*

*Proceedings of SPIE**(pp. 223–239). Berlin: Springer.*

*Data mining techniques for the life sciences**, 101, 2008–2024.*

*Proceedings of the IEEE**. Retrieved from http://www.irccyn.ec-nantes.fr/ivcdb/*

*Subjective quality assessment irccyn/ivc database**, 197 (2), 551.*

*The Journal of Physiology**, (pp. 894–897). Berlin: Springer.*

*Proceedings of the Pacific Rim Conference on Multimedia, Advances in Multimedia Information Processing**, 22 (6), 707–717.*

*Pattern Recognition**(pp. 1–22).*

*Workshop on Statistical Learning in Computer Vision, ECCV**, 18 (4), 717–728.*

*IEEE Transactions on Image Processing**, 4, 2379–2394*

*Journal of the Optical Society of America, A**, 101, 2008–2024.*

*Asilomar Conference on Signals, Systems, and Computers**Proceedings of the SPIE conference on human vision and electronic imaging*(pp. 93940J). Bellingham, WA: SPIE.

*, 25 (1), 372–387.*

*IEEE Transactions on Image Processing**, 21 (2), 155–158.*

*IEEE Signal Processing Letters**, 25 (1), 65–79.*

*IEEE Transactions on Image Processing**, 2, 1458–1465.*

*International Conference on Computer Vision**, 9 (2), 181–197.*

*Visual Neuroscience**, 64 (6), 384–404.*

*Psychological Review**, 20 (11), 1254–1259.*

*IEEE Transactions on Pattern Analysis and Machine Intelligence**(pp. 1693–1697). New York: IEEE.*

*Proceedings of the Asilomar Conference on Signals, Systems, and Computers**. New York: IEEE.*

*Proceedings of the international conference on computer vision and pattern recognition**2015 IEEE international conference on image processing (ICIP)*(pp. 2791–2795. New York: IEEE.

*, 16 (1), 37–68.*

*Journal of Neurophysiology**, 19 (1), 011006.*

*Journal of Electronic Imaging**IEEE international conference on image processing*(pp. 2281–2284). New York: IEEE.

*, 21 (12), 4695–4708.*

*IEEE Transactions on Image Processing**, 20 (3), 209–212.*

*IEEE Signal Processing Letters**, 17 (5), 513–516.*

*IEEE Signal Processing Letters**, 20 (12), 3350–3364.*

*IEEE Trans. Image Process**IEEE international workshop on qualitative multimedia experience*(pp. 87–91). New York: IEEE.

*(pp. 106–111). Location: Publisher.*

*Proceedings of the 4th European workshop on visual information**, 10 (4), 3045.*

*Advances of Modern Radio Electronics**, 31 (4), 75271L.*

*SPIE**, 5 (12), 583–601.*

*Vision Research**, 5 (4), 517–548.*

*Network: Computation in Neural Systems**, 15 (8), 2036–2045.*

*JOSA A**, 21 (8), 3339–3352.*

*IEEE Transactions on Image Processing**, 23 (3), 1352–1365.*

*IEEE Transactions on Image Processing**, 24 (6), 1879–1892.*

*IEEE Transactions on Image Processing**. New York, NY: McGraw Hill.*

*Perception**IEEE international conference on acoustics, speech, and signal processing*(pp. 1–869). New York: IEEE.

*, 5 (1), 52–56.*

*IEEE Transactions on Circuits and Systems for Video Technology**, 14 (11), 1918–1927.*

*IEEE Transactions on Image Processing**, 15 (11), 3440–3451.*

*IEEE Transactions on Image Processing**. In*

*Handbook of parametric and nonparametric statistical procedures**Proceedings of Asilomar conference on signals, systems, and computers*. Boca Raton, FL: CRC Press.

*, 23 (4), 684–694.*

*IEEE Transactions on Circuits and Systems for Video Technology**, 18 (1), 17–33.*

*Journal of Mathematical Imaging and Vision**IEEE international conference on computational visual pattern recognition*(pp. 305–312). New York: IEEE.

*IEEE international conference on computational visual pattern recognition*(pp. 2877–2884). New York: IEEE.

*IEEE international conference on image processing*(pp. 401–404). New York: IEEE.

*Proceedings of the IEEE international conference on image processing*(pp. 369–372). New York: IEEE.

*Statistical theories of the brain*(pp. 203–222). Cambridge, MA: MIT Press.

*, 13 (4), 600–612.*

*IEEE Transactions on Image Processing**, 19 (2), 121–132.*

*Signal Processing: Image Communication**, 2, 1398–1402.*

*Proceedings of the Asilomar Conference on Signals, Systems, and Computers**, 19 (1), 19–32.*

*Vision Research**. Berlin, Germany: Springer.*

*Visual quality assessment by machine learning**IEEE international conference on image processing*(pp. 3129–3138). New York: IEEE.

*.*

*IEEE Transactions on Image Processing**, 22 (4), 43025.*

*Journal of Electronic Imaging**, 29 (7), 725–747.*

*Signal Processing: Image Communication**, 4 (1), 1–8.*

*EURASIP Journal of Image and Video Processing*