From the responses in rounds A1–A3, we ended up with a set of 170 renderings, that is, 34 images in each of the 5 bins from perceived mirror to perceived glass, one-half of which were actually mirror and one-half glass. This practice allowed us to decorrelate images consistently seen as one category by humans from the true category of the images.
To increase the number of images for further use, we also trained two GANs to synthesize images with many of the visual characteristics of the renderings from each class. We find that such images include many cases that are more ambiguous than renderings, appearing somewhere between mirror and glass. In rounds B1 and B2, we identified 95 images (out a set of 1,400), which were consistently rated by 10 observers as belonging to specific bins. Combining the selected GAN images with the renderings yielded a total 265 diagnostic images, in which human judgments were perfectly decorrelated from ground truth that, for the GAN images, was defined by the class of images the GANs were trained on (
Figure 5B). Some examples are shown in
Figure 6. Interestingly, although it is possible to distinguish GAN images from renderings, those in the diagnostic image set (the bottom row in
Figure 6) tend to seem to have a coherent object shape and appearance, somewhat similar to the renderings. This approach is not without limitations. All models have inductive biases, including GANs, and the fact that observers can distinguish between GAN images and rendering implies that they may contain cues that renderings do not, or lack cues that renderings possess. Nevertheless, our goal is not to model ideal material discrimination, but rather human judgments. By selecting images that 10 out of 10 observers agreed upon, we can ensure that the GAN images considered here yield consistent subjective impressions along the mirror–glass continuum even if they are not physically accurate renditions of objects. Although 265 images in total is small relative to the set of possible images of mirror or glass, these images are not used for training models, only for testing them.