The model observer begins by computing measurements of the same texture statistics used in the Portilla & Simoncelli synthesis procedure, with one difference. The model observer computes statistics from circularly apertured texture images (like the images used in the perceptual experiments) and restricts its spatial averaging of statistics to locations within the aperture. In contrast, the Portilla-Simoncelli procedure computes statistics on rectangular fields of texture that are assumed to have periodic image boundaries. As a result of these differences, for any individual texture image, the statistics measured by the model observer are not identical to the statistics used to synthesize the image. Rather, the measured statistics vary between different texture image examples. These measurements from the apertured images are a more realistic approximation of the information available to human observers when the texture images are realized on a physical screen.
We measured texture statistics for many different naturalistic texture images and spectrally-matched noise images (960 total per condition), then applied linear discriminant analysis to find the weighted sum of statistics that optimally separates naturalistic texture from noise. Because we had many more statistics than sample images, we used regularization and cross-validation to avoid overfitting. The full set of statistics, measured over the full ensemble of images, constitutes a matrix
S of images by statistics. We define two matrices,
St and
Sn, for the statistics of the naturalistic texture and noise image sets, respectively. For each image set, we then measure the full covariance matrix for all statistics across all images in their set. We define Σ as the sum of these two covariance matrices:
\begin{eqnarray}{\rm{\Sigma }} = Cov\left( {{S_t}} \right) + Cov\left( {{S_n}} \right)\quad\end{eqnarray}
Linear discriminant analysis requires computing the inverse of this sum of covariance matrices. To regularize the inversion of the covariance matrix, we first define Σ
diag as a matrix that is equal to Σ for all diagonal elements and 0 otherwise. We then compute
\begin{eqnarray}{{\rm{\Sigma }}_{reg}}^{ - 1} = {\left( {{\rm{\Sigma }}*\left( {1 - \lambda } \right) + {{\rm{\Sigma }}_{diag}}*\lambda } \right)^{ - 1}}\quad\end{eqnarray}
where λ is a regularization parameter bounded between 0 and 1. λ = 0 computes the inverse covariance matrix with no regularization and λ = 1 computes the inverse assuming that the statistics do not covary at all across images. The weights of the linear discriminant are then computed as
\begin{eqnarray}w = \left( {Avg\left( {{S_t}} \right) - Avg\left( {{S_n}} \right)} \right)*{{\rm{\Sigma }}_{reg}}^{ - 1}\quad\end{eqnarray}
where
Avg(
St) and
Avg(
Sn) are the values of the statistics averaged over all images for the naturalistic texture and noise image sets, respectively. To cross-validate our calculation of the weights, we split each image group into five partitions (192 images each), refitting our model each time holding out 1/5 of the data as a test set. We then applied the weights to the held-out test data to compute a single, weighted sum discriminant value for each image. This resulted in two distributions of test set discriminant values: one for naturalistic images and one for noise images. We summarized the discriminability of these two distributions as a dʹ value
\begin{eqnarray}d^{\prime} = \frac{{Avg\left( {{s_t}} \right) - Avg\left( {{s_n}} \right)}}{{\sqrt {\frac{1}{2}*\left( {Var\left( {{s_t}} \right) + Var\left( {{s_n}} \right)} \right)} }}\quad\end{eqnarray}
where
st is the distribution of the naturalistic discriminants and
sn is the distribution of the noise discriminants. This analysis consistently found larger cross-validated dʹ values for λ>0 than for λ = 0 (suggesting regularization helped avoid overfitting) and found stable dʹ values over a large range of λs. For all analyses in this paper, we set λ to 0.01, a value that achieved high levels of discriminability for all texture families. Our results did not qualitatively change for different choices of λ.
To compute the sensitivity of the model observer, we applied each texture family's linear discriminant weights to statistics from images of intermediate naturalness levels. Specifically, we applied the training weights (found using fully naturalistic images) to test sets of statistics from images of intermediate strength. The training and test sets of images were segregated such that no seeds (the spectrally-matched noise images used to initialize the texture synthesis process) were shared between the two sets. Applying the weights to statistics from different naturalness levels allowed us to interpolate a relationship between discriminability (dʹ) and naturalness level. Simulations of the match-to-sample task with Gaussian distributions reached threshold (75% correct) for dʹ values of 1.94. Accordingly, we defined the model observer's threshold naturalness level as the value corresponding to dʹ = 1.94 and sensitivity as the inverse of threshold.
We repeated this process for “low-pass” observer models trained using subsets of the texture statistics limited to certain spatial frequency bands: only the lowest spatial frequency band, the two lowest, the three lowest, or the full set of all four frequency bands. For the purposes of all analyses, we grouped cross-scale statistics into the higher of their two constituent frequency bands. These four frequency band subsets approximate the effect of low-pass filtering the texture images with corner object frequencies of 14, 28, 57, and 113 c/image, respectively. This process generated the model observer sensitivity measurements plotted in
Figure 14. All five textures showed a rise of sensitivity with object spatial frequency, and differences between each families’ curves were not obviously related to any properties of the individual family texture sets. The rapid rise in sensitivity with object spatial frequency was also present when measured in an additional six texture families not used in this study.
We also repeated the process for “bandpass” observer models, each of which was limited to statistics from one of the four spatial frequency bands. As reported in the main text, bandpass model sensitivities were identical to low-pass model sensitivities when the single band of the bandpass model matched the highest band of the low-pass model. That is, the sensitivity of the highest frequency bandpass model was equivalent to that of the full (four band) low-pass model, the second highest bandpass was equivalent to the three lowest band model, etc. This calculation is complicated somewhat by the inclusion of cross-scale statistics which, contrary to the intention of the bandpass models, integrate information across different bands. We repeated these calculations for low-pass and bandpass models that did not have access to cross-scale statistics. Sensitivity was only slightly lower (average 8.8% decline) and was still equivalent between low-pass and bandpass conditions for all cases.