Abstract
The physical inputs to our visual system are dictated by the interplay between lights and surfaces; thus, for surface color to be stably perceived, the influence of the illuminant must be discounted. To reveal our strategy to infer the illuminant color, we conducted three psychophysical experiments designed to test our optimal color hypothesis that we internalize the physical color gamut under various illuminants and apply the prior to estimate the illuminant color. In each experiment, we presented 61 hexagons arranged without spatial gaps, where the surrounding 60 hexagons were set to have a specific shape in their color distribution. We asked participants to adjust the color of a center test field so that it appeared to be a full-white surface placed under a test illuminant. Results and computational modeling suggested that, although our proposed model is limited in accounting for estimation of illuminant intensity by human observers, it agrees fairly well with the estimates of illuminant chromaticity in most tested conditions. The accuracy of estimation generally outperformed other tested conventional color constancy models. These results support the hypothesis that our visual system can utilize the geometry of scene color distribution to achieve color constancy.
Figure 5a shows the averaged observers’ settings across 10 trials in a MB chromaticity diagram. Error bars are not shown to increase the visibility. The shape of each symbol indicates the distribution condition (mountain, reverse, or flat), and the colors indicate the illuminant conditions (3000 K, 6500 K, or 20,000 K). Different subpanels show the results for different observers. Each colored cross symbol indicates the chromaticity of the test illuminant, thus indicating the ground truth in each illuminant condition.
Figure 5b shows the predictions from the optimal color model, mean LMS model, and mean chromaticity model. Note that in this experiment the optimal color model provided a perfect estimation of illuminant chromaticity regardless of the distribution condition due to the experimental design. For this reason, we plotted only predictions for the mountain condition. This is self-evident, as we used optimal colors for stimuli in this experiment, but we also applied fitting procedures to each of nine conditions based on
Equation 1 for a sanity check. This confirmed that predicted chromaticity and intensity for each condition indeed matched the chromaticity and the intensity of a test illuminant.
To calculate the prediction of the mean LMS model, we first averaged cone responses across the 60 surrounding hexagons, and then we converted the averaged cone responses to MB chromaticity coordinates. Note that this manipulation is equivalent to calculating the luminance-weighted average of MB chromaticity coordinates. Here, it can be seen that the chromaticities estimated by mean LMS are positioned very close to the chromaticity of the test illuminants, and there is little difference across distribution conditions under any illuminant condition. The mean chromaticity model provides a prediction based on the average chromaticity across the 60 surrounding hexagons. The predictions are deviated mainly toward a higher S/(L+M) direction from the illuminant chromaticities and do not change depending on the distribution condition. Thus, only predictions for the mountain condition are provided in the figure. In this study, we compared prediction performance between our model and illuminant estimation based on mean chromaticity or mean LMS. This was not intended to rule out more complex color constancy models but rather to evaluate the extent to which simplistic strategies can account for observers’ behaviors.
If an observer is perfectly color constant, the observers’ settings in
Figure 5a should superimpose on the chromaticity of the corresponding test illuminant shown by cross symbols; however, that was never the case in this experiment. Instead, all settings showed some deviation from the ground-truth points, and some systematic trends were observed as follows. First, for all observers, we see that the observers’ settings for the three distribution conditions are rather closely clustered. This trend also roughly held for any illuminant condition, suggesting that the shape of the chromaticity–luminance distribution had little impact on the observers’ estimation of the illuminant chromaticity in this experiment, as predicted by the optimal color model, mean LMS model, and mean chromaticity model as shown in
Figure 5b.
Second, as predicted, the observers’ settings were separated depending on the illuminant condition. Thus, unsurprisingly, the color temperature of the test illuminant had a strong effect on the observers’ settings. The degree of separation across the illuminant condition appears to depend on the observers; for example, settings for observer MS were closely gathered across test illuminants whereas those for observer KU showed a greater separation.
Our methodology allowed observers to simultaneously adjust luminance in addition to chromaticity, which provided an indication of each observer's estimation of illuminant intensity.
Figure 6 shows the luminance setting for each condition, averaged across four observers. To determine whether the observers’ settings showed agreement with simple luminance statistics, red and cyan crosses indicate the mean luminance and the highest luminance across the 60 surrounding surfaces, respectively. Blue crosses show the intensities of test illuminants (i.e., ground truth); thus, if an observer's setting matches the blue crosses, that indicates that the observer perfectly estimated the intensity of the illuminant. Note that in this experiment the optimal color model provided the perfect estimation of illuminant intensity for all conditions by design, so the blue cross symbols also indicate the prediction of the optimal color model. The observers’ estimations of intensity were never perfect, however. Instead, their settings generally exceeded the set illuminant intensity, showing that the observers assumed the illuminants were much more intense than they actually were.
To quantitatively evaluate whether the pattern of luminance settings can be explained by simple luminance statistics, we calculated correlation coefficient between the observers’ average settings and each luminance statistic across nine conditions. Correlation coefficients were 0.0788 (p > 0.05), 0.710 (p = 0.0320), and 0.700 (p = 0.0359) for mean luminance, highest luminance, and illuminant intensity, respectively. Thus, the highest luminance model and illuminant intensity (i.e., prediction from the optimal color model) showed a significant correlation. It is possible that observers used different statistics or a more complicated strategy, but the findings suggest that their behaviors can be explained reasonably well by the optimal color model or simple luminance statistics (highest luminance in this case).
Next, a two-way, repeated-measures analysis of variance (ANOVA) was performed with distribution condition (mountain, reverse, or flat) and illuminant condition (3000 K, 6500 K, or 20,000 K) as the within-subject factors for the luminance settings. The main effects of distribution condition and illuminant condition were not significant: F(2, 6) = 3.48, p > 0.05; F(2, 6) = 3.88, p > 0.05, respectively. However, the interaction between the two factors was significant: F(4, 12) = 5.45, p = 0.00975. Further analysis of the interaction revealed that the simple main effect of distribution condition was significant at 6500 K and 20,000 K, F(2, 6) = 9.71, p = 0.0132; F(2, 6) = 7.80, p = 0.0214, respectively, but not at 3000 K, F(2, 6) = 0.15, p > 0.05. Furthermore, the simple main effect of illuminant condition was significant for the mountain condition, F(2, 6) = 5.48, p = 0.0443, but was not significant for the reverse and flat conditions, F(2, 6) = 1.39, p > 0.05; F(2, 6) = 3.76, p > 0.05, respectively.
To further clarify the relation across levels regarding the simple main effects of distribution and illuminant, the results of multiple comparisons with Bonferroni's correction (significance level α = 0.05) where significant differences were found are indicated by asterisks in
Figure 6. Overall, the results of these statistical analyses suggest that the luminance settings were higher for the reverse condition than for the mountain and flat conditions at 6500 K, and they were higher for the reverse condition than for the flat condition at 20,000 K.
Next, to examine the extent to which color constancy held for each condition, it would be helpful to quantify the degree of constancy. Several metrices have been used in past studies (summarized in
Foster, 2011), but a fundamental idea is to capture how much the observers’ subjective white points shifted in relation to physical changes of illuminant chromaticity. In this study, we used an index that considers both the direction and the amount of the observers’ settings shift.
As shown in
Figure 7, we first defined two vectors,
\(\vec a\) and
\(\vec b\), where
\(\vec a\) is a vector originating from the chromaticity of the 6500 K illuminant to the illuminant chromaticity of 3000 K or 20,000 K. The vector
\(\vec b\) extends from the observers’ settings under 6500 K to settings under the test illuminant (3000 K or 20,000 K). We defined θ as indicating the angle that two vectors create, as shown in right panel in
Figure 7. The constancy index (CI) is calculated by
Equation 2:
\begin{equation}{\rm{CI}} = \;\left| {\vec b} \right|c{\rm{os\theta }}/\left| {\vec a} \right|\end{equation}
If the shift of an observer's setting caused by illuminant change is the same as the shift of illuminant chromaticity in terms of both distance and direction, then the CI becomes 1.0. Any deviation in distance or direction would lower the CI. In theory, the CI can take a value more than 1.0, but we did not find such a case in this study. If the direction of shift perfectly matches between \(\vec a\) and \(\vec b\) (i.e., θ = 0), then cosθ becomes 1.0 and thus does not lower the CI value. In this sense, cosθ serves as a loss factor due to the directional deviation.
Also note that a problem with calculating a distance-based metric in an MB chromaticity diagram is that scales for L/(L+M) and S/(L+M) are different and arbitrary. Thus, to give approximately equal consideration to both directions, we divided each axis by the average of standard deviations across four observers of the settings under the mountain 6500 K condition (which was considered to be the standard condition). All CIs were calculated in this scaled MB chromaticity diagram. Our metric is not the only way to quantify color constancy, but we confirmed that other common constancy indices agreed well with our implementation (see
Supplementary Material).
The bar chart in
Figure 8 shows averaged CIs across four observers for each condition. First it was shown that the CIs were roughly around 0.5 for the mountain and reverse conditions at 3000 K, and the CIs were slightly lower for 20,000 K. The flat condition had a slightly higher CI, especially for 3000 K.
We performed two-way, repeated-measures ANOVAs with distribution condition (mountain, reverse, or flat) and illuminant condition (3000 K or 20,000 K) as the within-subject factors for the CIs. The main effect of distribution condition was significant, F(2, 6) = 8.40, p = 0.0182, whereas the main effect of the illuminant conditions and the interaction between two factors were not significant, F(1, 3) = 7.63, p > 0.05; F(2, 6) = 0.59, p > 0.05, respectively.
Multiple comparisons with Bonferroni's correction (significance level α = 0.05) for the distribution condition found that the flat condition showed higher CIs than the mountain and reverse conditions, but there was no significant difference between the mountain and reverse conditions.
Overall, the statistical analyses indicated that color constancy worked equally well under 3000 K and 20,000 K, and it also worked better for the flat distribution than for the other distributions in
Experiment 1. We note that the use of a low luminance level for the surrounding stimuli might have introduced rod intrusion on the color appearance of the surrounding stimuli and test field. However, the CI is a relative measure and thus partially suppresses the influence of rod intrusion.
We have so far discussed the degree to which human observers achieved color constancy; however, the main purpose of the present study was to quantify how well our optimal color models accounts for the pattern of human observers’ settings, ideally in comparison with other candidate color constancy models. The CI calculates how much an observer's setting shifted in relation to a shift of the physical illuminant chromaticity. Thus, it becomes higher when shifts in the perceptual white point and physical illuminant are close to each other in distance and direction, and it becomes 1.0 when the two shifts perfectly match. Although the CI was originally designed to reflect the degree of color constancy, we can also consider a metric that indicates the degree to which the shift of illuminant chromaticities predicted from computational models agrees with that of the observers’ white-point settings. Based on this idea, we quantified the degree to which our optimal color model and other computational models (mean LMS and mean chromaticity) predicted human observers’ settings. We introduced the model index (MI) as defined by
Equation 3, which replaces the × symbols in
Figure 7 with + symbols, indicating predictions from a model as shown in
Figure 9:
\begin{eqnarray}
{\rm{MI}} = \left| {\vec b} \right|{\rm{cos\varphi }}/\left| {\vec c} \right|\quad
\end{eqnarray}
Higher values indicate better model prediction, and 1.0 indicates perfect prediction.
Figure 10 shows the MIs in each condition. Note that in this experiment the optimal color model predicted illuminant chromaticity perfectly; therefore, the MI values match those of the CIs. The mean LMS model first averaged cone signals across 60 surfaces, and the average cone signal was transformed to MB chromaticity coordinates. The mean chromaticity estimates illuminant color based on the average chromaticity across 60 surfaces.
We can see that the MI values are high in the order of optimal color, mean LMS, and mean chromaticity models for all conditions. We ran the following analysis to confirm if our model prediction was statistically better than that of the mean LMS and mean chromaticity models. We performed a three-way, repeated-measures ANOVAs with model type (optimal color, mean LMS, and mean chromaticity), distribution condition (mountain, reverse, and flat), and illuminant condition (3000 K and 20,000 K) as the within-subject factors for the MIs. The main effects of the model type and distribution condition were significant, F(2, 6) = 52.8, p = 0.000155; F(2, 6) = 7.93, p = 0.0207, respectively, whereas the main effect of illuminant condition was not significant, F(1, 3) = 8.51, p > 0.05. The interactions between model type and distribution condition and between model type and illuminant condition were significant, F(4, 12) = 15.4, p = 0.000113; F(2, 6) = 220.9, p < 0.00001, respectively. In contrast, the interaction between distribution condition and illuminant condition was not significant, F(2, 6) = 0.57, p > 0.05. The interaction among three factors also was not significant, F(4, 12) = 3.11, p > 0.05.
We then analyzed simple main effects for the interaction between model type and distribution condition. The simple main effect of model type was significant for all distribution conditions: F(2, 18) = 45.4, p < 0.00001 for the mountain condition; F(2, 18) = 42.9, p < 0.00001 for the reverse condition; and F(2, 18) = 62.8, p < 0.00001 for the flat condition. Also, the simple main effect of the distribution condition was significant for all models: F(2, 18) = 8.36, p = 0.00271; F(2, 18) = 7.74, p = 0.00375; and F(2, 18) = 7.69, p = 0.00386, respectively. Post hoc multiple comparison using a Bonferroni's correction (significance level, 0.05) revealed the following: (a) mountain = reverse, mountain < flat, and reverse < flat for the optimal color model; and (b) mountain = reverse, mountain < flat, and reverse = flat for the mean LMS and mean chromaticity models. We also found the following results: optimal color model > mean LMS model, optimal color model > mean chromaticity model, and mean LMS model > mean chromaticity model for all distribution conditions (i.e., mountain, reverse, and flat distributions).
Next, we analyzed simple main effects of interaction between model type and illuminant condition. We found that model type showed significant simple main effects at both 3000 K and 20,000 K, F(2, 12) = 97.7, p < 0.00001; F(2, 12) = 19.8, p = 0.000158, respectively. Moreover, the simple main effects of illuminant condition were significant for the optimal color model and mean LMS model, F(1, 9) = 12.5, p = 0.00636; F(1, 9) = 8.78, p > 0.0159, respectively, but were not significant for the mean chromaticity model, F(1, 9) = 4.81, p > 0.05. Again multiple comparison using a Bonferroni's correction (significance level, 0.05) provided the following results: (a) optimal color model > mean LMS model, optimal color model > mean chromaticity model, and mean LMS model > mean chromaticity model for 3000 K; and (b) optimal color model > mean LMS model, optimal color model > mean chromaticity model, and mean LMS model = mean chromaticity model for 20,000 K. Overall, these statistical tests suggest that our optimal color model predicted observers’ settings generally better than the mean LMS and mean chromaticity models, at least under the current experimental procedure. This observation held for the different distributions and test illuminants, as supported by the results from multiple comparisons reported above.
With regard to the degree of color constancy, it was shown that human observers can achieve roughly 50% constancy for each distribution condition, although the flat condition showed slightly higher CIs. MI-based analysis revealed that our optimal color model predicted the shift of observers’ settings in response to test illuminant changes better than other candidate models in general. With regard to raw chromaticity settings (
Figure 5a), our model was successful in revealing that the shape of distribution had little impact, but the mean LMS model and mean chromaticity model also predicted this observation (
Figure 5b). In
Experiment 2, to further pursue the applicability of our model, we tested three different distributions designed to separate predictions made by our model and the mean LMS model. We designed the shape of color distributions so that optimal color model would again perfectly predict the chromaticity of the illuminant, but the mean LMS model predicted a different illuminant for each distribution condition. Our intent was to determine the effect of the shape of the color distribution on human observers’ white-point settings, as predicted by the optimal color model.
The presentation of results follows the format in
Experiment 1.
Figure 12a shows the averaged observers’ settings across 10 trials. The shape of the symbol indicates the distribution condition (red-reduced, green-reduced, or blue-reduced), and the colors indicate the illuminant condition (3000 K, 6500 K, and 20,000 K).
Figure 12b shows predictions from the optimal color model, mean LMS model, and mean chromaticity model. As in
Experiment 1, the optimal color model provided perfect estimation of illumination for all conditions by design; thus, we only plotted predictions for the red-reduced condition. Predictions from the mean LMS model show some differences depending on distribution condition as expected. For example, for the red-reduced condition, surfaces with high L/(L+M) had lower luminance than the other surfaces, thus contributing less to the averaged LMS values. As a result, the prediction shifted toward the lower L/(L+M) direction. The mean chromaticity model predicted exactly the same chromaticity as
Experiment 1 because our experimental manipulation changed only the luminance profiles of the surrounding stimuli while keeping the chromaticities of the 60 surfaces constant.
For observer MS and TK, we see that their settings for the three distribution conditions seem to come close under any illuminant condition, which is consistent with the prediction from the optimal color model. However, some separation and systematic pattern of settings were observed for KU and especially KF—lower L/(L+M) for the red-reduced condition, higher L/(L+M) for the green-reduced condition, and lower S/(L+M) for the blue-reduced condition—observations that seem to be somewhat consistent with the predicted pattern by the mean LMS model. Also, the observers’ settings were predictably influenced by the color temperature of the illuminant.
Figure 13 shows the averaged luminance settings across four observers for each condition. The predictions for mean luminance, highest luminance, and illuminant intensity are indicated by red, cyan, and blue cross symbols, respectively. We performed two-way, repeated-measures ANOVAs with the distribution condition (red-reduced, green-reduced, or blue-reduced) and illuminant condition (3000 K, 6500 K, or 20,000 K) as the within-subject factors for the luminance settings. The main effect of distribution condition and illuminant condition were not significant:
F(2, 6) = 0.10,
p > 0.05;
F(2, 6) = 1.65,
p > 0.05, respectively. The interaction between two factors also was not significant,
F(4, 12) = 0.55,
p > 0.05. Thus, there was no significant difference for any pair of conditions in
Experiment 2.
We again calculated correlation coefficients between the nine observers’ settings and each luminance statistics. The correlation coefficients were –0.356 (p > 0.05), –0.6273 (p > 0.05), and 0.0210 (p > 0.05) for mean luminance, highest luminance, and illuminant intensity (i.e., prediction from the optimal color model), respectively. Thus, these models failed to predict the human observers’ performance. This might be due to the observers’ use of a more complex strategy than the simple luminance statistics tested here.
Next, we quantified the degree of color constancy based on
Equation 2.
Figure 14 shows the CIs for all conditions. We conducted two-way ANOVAs with the distribution condition (red-reduced, green-reduced, or blue-reduced) and illuminant condition (3000 K or 20,000 K) as the within-subject factors for the CIs. The main effect of distribution condition was not significant:
F(2, 6) = 3.59,
p > 0.05. Also, the main effect of illuminant condition and the interaction between two factors were not significant:
F(1, 3) = 7.18,
p > 0.05;
F(2, 6) = 2.16,
p > 0.05, respectively. Consequently, the degree of color constancy was not influenced by the shape of the distribution or the color temperature of the test illuminants.
We again calculated MIs from
Equation 3 based on predictions from the optimal color, mean LMS, and mean chromaticity models as shown in
Figure 15. For the 3000 K condition (top nine bars), it was found that the MI values were generally higher for the optimal color model than for the other models, except for the blue-reduced condition, where the mean LMS model showed a slightly higher MI than that of the optimal color model. This observation also held for the 20,000 K condition. However, the error bars are large, thus making it difficult to make a conclusive statement about whether our optimal color model overall worked better than other models. For this reason, we performed the following statistical analyses.
Three-way, repeated-measures ANOVAs were conducted with model type (optimal color, mean LMS, or mean chromaticity), distribution condition (red-reduced, green-reduced, or blue-reduced), and illuminant condition (3000 K or 20,000 K) as the within-subject factors for the MIs. The main effect of model type was significant, F(2, 6) = 68.1, p = 0.000075, whereas the main effects of distribution condition and illuminant condition were not significant: F(2, 6) = 1.27, p > 0.05; F(1, 3) = 6.82, p > 0.05, respectively. The interaction between model type and distribution condition and between model type and illuminant condition were significant: F(4, 12) = 48.1, p < 0.00001; F(2, 6) = 18.6, p < 0.00268, respectively. In contrast, the interaction between distribution condition and illuminant condition was not significant, F(2, 6) = 2.18, p > 0.05. The interaction among three factors also was not significant, F(4,12) = 3.02, p > 0.05.
Regarding the significant interaction between model type and distribution condition, the simple main effect of model type was significant under the red-reduced, green-reduced, and blue-reduced conditions: F(2, 18) = 38.7, p < 0.00001; F(2, 18) = 75.6, p < 0.00001; F(2, 18) = 64.0, p < 0.00001, respectively. In addition, the simple main effect of distribution condition was significant for the optimal color model, F(2, 18) = 4.79, p = 0.0184, but was not significant for the mean LMS model or mean chromaticity model: F(2, 18) = 2.79, p > 0.05; F(2, 18) = 3.36, p > 0.05, respectively. Furthermore, multiple comparison using a Bonferroni's correction (significance level, 0.05) showed that red-reduced = green-reduced; red-reduced = blue-reduced; and green-reduced > blue-reduced for the optimal color model. Also, optimal color > mean LMS, optimal color > mean chromaticity, and mean LMS > mean chromaticity for both the red-reduced and blue-reduced conditions. For the green-reduced condition, optimal color > mean LMS model, optimal color > mean chromaticity, and mean LMS model = mean chromaticity.
Next, we analyzed the simple main effects of interactions between model type and illuminant condition. We found that model type showed significant simple main effects at 3000 K and 20,000 K: F(2, 12) = 79.7, p < 0.00001; F(2, 6) = 31.0, p = 0.000018, respectively. Moreover, the simple main effects of illuminant condition were significant for the optimal color model and mean LMS model, F(1, 9) = 8.27, p = 0.0183; F(1, 9) = 8.88, p = 0.0155, respectively, but they were not significant for the mean chromaticity model, F(1, 9) = 3.73, p > 0.05. Again, multiple comparison using a Bonferroni's correction (significance level, 0.05) showed that optimal color > mean LMS, optimal color > mean chromaticity, and mean LMS > mean chromaticity at 3000 K. At 20,000 K, optimal color > mean LMS, optimal color > mean chromaticity, and mean LMS = mean chromaticity.
Our main interest in this experiment was to evaluate whether our proposed model preforms better than other candidate models. The calculated MIs and statistical analyses overall support this idea, which is consistent with the trends in
Experiment 1. However, predictions from the optimal color model are still limited in
Experiments 1 and
2, as MIs for the optimal color model were considerably smaller than 1.0, showing a lack of agreement. Thus, it is still inconclusive as to whether the optimal color model is a good candidate strategy to be adopted by human observers to infer the influence of illuminant.
It is worth noting that we used optimal surfaces for the surrounding stimuli that do not exist in the real world; thus, the observers never experienced luminance distributions formed by optimal colors. Instead, in the real world, the visual system encounters scenes containing natural objects that have a reflectance less than 1.0. In
Experiment 3, we used 60 reflectances selected from a database of 575 natural objects (
Brown, 2003). We manipulated the shape of the distribution to “deceive” the optimal color model, so that even when the test illuminant was held constant the model predicted different illuminant chromaticities influenced by the shape of the color distribution. We then measured the observers’ white points using the same procedure as in
Experiments 1 and
2. We tested whether the white points of human observers change in response to a change in the shape of the color distribution, and, if so, whether the optimal color model can predict those shift patterns.