Abstract
Color constancy has been studied intensely during the past few decades in humans and machines, but evaluation and comparisons are difficult due to a lack of large data sets with hyperspectral ground truth. We used the spectral renderer Mitsuba to render a training data set of more than 100,000 images in various illuminations, utilizing over 1700 3D mesh objects, 1270 reflectance functions of Munsell chips, and three materials (diffuse, plastic, metal). Each image contained multiple objects with varying colors and materials. The validation dataset was generated using the same illuminations as the original dataset, as well as some additional ones. 330 different reflectance functions (Munsell chips from the World Color Survey) and new objects were used in the validation dataset. We modified a convolutional U-Net model to represent the hue, brightness, and chroma of each pixel in the output. One naïve model was trained with D65 illumination only, while the color constancy model (U-NET-CC) was trained with the full range of illuminations. The U-NET-CC model did a remarkable job in recovering surface reflectance at each pixel. The average absolute error for hue was 2.8 of 80 Munsell hue units, for value it was 0.85 of 9.5, and for chroma 1.3 of 16, respectively. Substantial errors occurred only for the brightest and darkest surfaces, for which humans also find it quite difficult to estimate the hue. Error increased with increasing size of the illumination change, but we did not find a difference between naturally occurring illuminants along the daylight axes and artificial ones orthogonal to that axis. Overall, our U-NET model is capable of assigning a constant reflectance to objects, independent of shading, and it is capable of discounting the illuminant.