Abstract
When viewing objects depicted in a frame, most of us prefer to view large objects like sofas in larger sizes and smaller objects like paperclips in smaller sizes. In general, the visual size of an object that “looks best” is linked to its typical physical size in the world (Konkle & Oliva, 2011). Why is this the case? One intuitive possibility is that these preferences are driven by semantic knowledge: For example, we recognize a sofa, we access our knowledge about its real-world size, and this influences what size we prefer to view the sofa in a frame. However, might visual processing play a role in this phenomenon—that is, do visual features that are related to big and small objects look better at big and small visual sizes, respectively, even when observers do not have explicit access to semantic knowledge about the objects? To test this possibility, we used a kind of stimuli called “texforms” (Long et al., 2018), which are synthesized images that retain mid-level visual features related to the texture and coarse form information of the original images, but are unrecognizable at the basic-level. We used a two-interval forced choice task, in which each texform was presented at the canonical visual size of its corresponding original image, and a visual size slightly bigger or smaller. 12 out of 15 observers consistently selected the texform in the canonical visual size as the more visually pleasing one above chance (61.8%, SEM=2.7%). This result suggests that the preferred visual size of an object depends not only on explicit knowledge of its real-world size, but also can be evoked by mid-level visual features that systematically covary with an object’s real-world size.