Purchase this article with an account.
Takuma Morimoto, Samuel Ponting, Hannah E. Smithson; Modeling perceptual discrimination of surface color using image chromatic statistics and convolutional neural networks. Journal of Vision 2021;21(9):2742. doi: https://doi.org/10.1167/jov.21.9.2742.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
A previous study measured thresholds to discriminate colors of objects under each of three different lighting environments. Discrimination thresholds were similar for matte and glossy objects, but the orientations of the discrimination ellipses were tightly aligned with the chromatic variation of the lighting environment in which the objects were placed (Morimoto & Smithson, 2018). Using two distinct modeling approaches we analyzed the psychophysical data to reveal the potential strategies that humans used to perform the discrimination task. First, we built three hand-crafted models that discriminate objects’ colors by comparing specified chromatic statistics: (i) mean chromaticity, (ii) chromaticity of the brightest pixel, and (iii) luminance-weighted-mean chromaticity. In the second approach, we trained convolutional neural networks (CNNs), based on 38,021 images labelled either by physical ground-truth or human responses. Then, thresholds were estimated for all models using the identical staircase procedure that measured human thresholds. The first approach showed that the mean chromaticity and the luminance-weighted-mean-chromaticity models predicted human thresholds generally well, but the brightest-pixel model predicted thresholds better in some matte conditions, indicating that no tested model based on single chromatic statistics can predict thresholds consistently well across conditions. Moreover, the estimated thresholds for these models were generally higher than human thresholds. In contrast, the CNN trained on images with human responses nearly perfectly predicted human thresholds in all conditions. The CNN trained on physical ground-truth showed much lower thresholds than human thresholds. Visualizing activation maps of the CNN trained on human responses revealed that the CNN primarily looks at shaded regions of the surface that are dominated by diffuse reflections of indirect illumination and thereby provide more reliable information about surface color. Combining hypothesis-based and data-driven approaches revealed an effective strategy to separate lighting and material to reliably discriminate surface color under complex lighting environments, which humans might also use.
This PDF is available to Subscribers Only