Abstract
We often think of perception in terms of relatively low-level properties (such as color and shape), but we can also perceive seemingly higher-level properties, such as physical stability -- as when we can see at a glance whether a tower of blocks will fall or not. Prior work has demonstrated that physical stability is extracted quickly and automatically -- both by deep networks, and during human visual processing -- but it remains unclear just which properties are used to compute such percepts. In the current study, observers viewed pseudorandom 3D computer-generated images of block towers, such that the ground truth of each tower's stability could be simulated in a physics engine, and compared with observers' percepts of whether each tower would fall. Critically, towers were carefully constructed so that percepts of (in)stability could not be based on especially trivial properties such as global asymmetry, or the shape of a tower's boundary envelope. Our analyses demonstrate that observers are sensitive not only to whether a tower will fall, but also to continuous degrees of instability. In particular, the most powerful factor driving observers' percepts of instability was the summed distances that each block moved between the initial and post-fall tableaus, independent of the towers' initial heights (even though of course observers never actually saw the towers falling) -- a factor that wasn't as salient in past models. Variance in the blocks' initial horizontal positions was also a powerful predictor of perceived (in)stability, independent of global symmetry. By combining psychophysics with physics-based simulation and computational modeling, these and other results help to reveal just how we can perceive physical (in)stability at a glance -- a capacity that may be of great adaptive value, given the importance in vision of predicting how our local environments may be about to change.