Abstract
In a recent study, using a constrained image sampling approach, Sebastian et al. (PNAS, 2017) found that the thresholds of the template-matching (TM) observer in natural backgrounds are the separable product of the local background luminance (L), contrast (C) and cosine similarity to the target (S): threshold is proportional to L × C × S. Further, they found that human thresholds matched those of the TM observer over the range of conditions tested in the humans. However, in natural scenes these background properties are often not homogenous across the target region, potentially impacting the visibility of different parts of the target. The approximately-optimal strategy for dealing with inhomogeneous backgrounds is to normalize the template by the estimated local variance at each pixel location within the template. The threshold of the resulting reliability-weighted, template-matching (RWTM) observer is proportional to (L × C × S) / E, where E is the square root of the energy of the reliability-weighted template. To test this prediction, calibrated natural image patches were sorted into 3D histograms based on their luminance, contrast and similarity. Within 4 of these bins, image patches were further divided into 5 sub-bins based on their reliability weighted template energy. For all the sub-bins within a bin, the values of L, C and S are approximately constant, and hence the TM observer predicts approximately constant thresholds, whereas the RWTM observer predicts thresholds to fall inversely with E. Across all 20 (4 × 5) conditions, the RWTM observer accounts for 89% of the variance in the human thresholds, whereas the MT observer for only 11% of the variance. Furthermore, the RWTM observer was a better predictor of trial-by-trial responses in human observers (i.e., larger decision-variable correlations). We conclude that a RWTM observer based directly on natural scene statistics predicts human detection performance in natural backgrounds.
Acknowledgement: NIH EY024662