Abstract
We can readily appreciate whether a tower of blocks will topple or a stack of dishes will collapse. How? Recent work suggests that such physical properties of scenes are extracted rapidly and efficiently as part of automatic visual processing (Firestone & Scholl, VSS2016, VSS2017). However, physical reasoning can also operate in ways that seemingly differ from visual processing. For example, subjects who are explicitly told that some blocks within a tower are heavier than others can rapidly update their judgments of that tower's stability (Battaglia et al., 2013); by contrast, automatic visual processing is typically resistant to such explicit higher-level influence (Firestone & Scholl, 2016). Here, we resolve this apparent conflict by revealing how distinct flexible and inflexible processes support physical understanding. We showed subjects towers with differently-colored blocks, where one color indicated a 10x-increase in mass. Subjects successfully incorporated this information into their judgments of stability, accurately identifying which towers would stand or fall by moving their cursors to corresponding buttons. However, analyses of these cursor trajectories revealed that some towers were processed differently than others. Specifically, towers that were "stable" but that would have been unstable had the blocks been equally heavy (i.e. towers with unstable geometries) yielded meandering cursor trajectories that drifted toward the incorrect stability judgment ("fall") before eventually arriving at the correct judgment ("stand"). By contrast, towers that were "stable" both in terms of their differentially-heavy blocks and in terms of their superficial geometries produced considerably less drift. In other words, even when subjects accurately understood how a tower would behave given new information about mass, their behaviors revealed an influence of more basic visual (geometric) cues to stability. We suggest that physical scene understanding may not be a single process, but rather one with separable stages: a fast, reflexive, genuinely "perceptual" stage, and a slower, flexible "cognitive" stage.
Meeting abstract presented at VSS 2018