Thus, in addition to basic descriptive statistics, we used metrics computed from the structure of the specular reflection image (
Marlow, Kim, & Anderson, 2012;
Schmid, Barla, & Doerschner, 2021) to predict participants’ gloss settings. As shown in
Figure 11b, we first converted the original image to a luminance image in which each pixel has the unit of candela per square meter, masked out the object region, and subtracted the diffuse component from the image, which resulted in test images with specular reflection alone. Then, we extracted pixels whose intensity is higher than
k% value of the highest intensity across this specular image, where
k took the following values 0, 1, 3, 5, 10, 20, and 40 to get rid of the region of specular reflection that stems from secondary and higher-order inter-reflections. Using this thresholded highlight image we calculated the following three metrics. The first metric is coverage, corresponding with the proportion of area covered by the highlight relative to the whole object area as depicted in the top left part of
Figure 11c. Second, we calculated the sharpness. Using a spatial convolution, this metric emphasizes the region where luminance rapidly changes and sharpness is defined as a mean value of the convoluted sharpness map (
Vu, Phan, & Chandler, 2012) as shown in top right part of
Figure 11c. For coverage and sharpness, model predictions were affected by the cut-off percentage to threshold the highlight regions and thus we selected an optimal value of
k that produced the highest correlation value with human settings. We note that, by searching for the optimal cut-off threshold in this way, we considered a possibility that a “low-light” region of the specular image could contribute to human gloss percept (
Kim, Marlow, & Anderson, 2012). Finally, the third metric was contrast, which essentially measures the spatial luminance variation over the surface. The standard way would be to calculate a contrast from the raw highlight image directly. However, considering a previous observation that perceived gloss is affected by the modulation of a specific frequency channel (
Boyadzhiev, Bala, Paris, & Adelson, 2015), we first decomposed the raw highlight image into eight sub-band images using a Gaussian band-pass filter (upper and lower cut-off frequencies: 1.5 to 3.0, 3.0 to 6.0, 6.0 to 12.0, 12.0 to 24.0, 24.0 to 48.0, 48.0 to 96.0, 96.0 to 192, and 192 to 384 cycles/image) and a subset of sub-band images are shown in the lower part of
Figure 11c. Here, the highest center frequency was 18.2 cpd, which would be comfortably resolved for participants with the visual acuity 20/20. We calculated the RMS contrast, equivalent to the standard deviation of the pixel intensities, for each sub-band image as well as for an aggregated image across all frequencies. This means that, unlike coverage and sharpness, which has one parameter, the sub-band contrast metric has two free parameters (i.e., the cut-off pixel intensity and the cut-off spatial frequency band), and optimal values producing the highest correlation with human settings were selected. For all three metrics, searching for the best parameters was performed separately for each type of lighting environment (natural, gamut-rotated, phase-scrambled).