Figure 3A shows the distributions obtained for each of our five metrics under the CC and D65 conditions. For the accuracies, we considered the distributions of values found for each Munsell class and illuminations (each point of the distribution is thus computed with 10 images). For
\(\Delta\)E and CCI, we plot the distributions of values found for individual images. Under the CC condition, we found median top-1, top-5, and Muns
\(^3\) accuracies of 80%, 100%, and 100%, respectively, across Munsell classes. The first quartiles are at 60%, 90%, and 90%, respectively. This means that for the majority of Munsell classes, DeepCC selects the correct Munsell class in four out of five images, and when wrong, it still selects a neighboring chip. This is confirmed by the distributions found for
\(\Delta\)E and CCI, with median values of 0 and 1. Eighty-five percent of the images yielded less than 5
\(\Delta\)E error as indicated by the whiskers, 93% less than 10
\(\Delta\)E error, and 99% less than 19
\(\Delta\)E. As a comparison, note that the median
\(\Delta\)E distance between adjacent chips is approximately 7.5. This means that when DeepCC instances selected the wrong chip, it tended to be a close neighbor of the correct one. This is confirmed by the Muns
\(^3\) accuracy, according to which the model had an accuracy equal to or above 90% for 95% of the Munsell classes. Similarly, DeepCC showed a CCI higher than 0.83 in 75% of cases. This CCI value of 0.83 is among the higher end of CCI values measured in humans psychophysical experiments (cf.
Foster, 2011;
Witzel & Gegenfurtner, 2018, for reviews), thus indicating the supra-human performance of the model on this dataset. We also found a positive CCI value in more than 87% of cases, evidence that DeepCC not only learned to discriminate between Munsell colors with high accuracy but also learned to account for potential color shifts induced by the illumination.