Abstract
Visual material perception is computationally complex because physical properties such as rigidity or friction are not directly observable. In many cases, however, viewing a dynamic interaction between different objects reveals their internal properties. If a couch cushion deforms dynamically under the weight of a box, we infer the cushion’s stiffness as well as the weight of the box. This indicates that the brain jointly interprets the interplay of multiple objects in a physical scene. Can the brain infer the physical structure when only one of the interacting objects is visible, while all others are artificially rendered invisible? To answer this question, we leveraged computer graphics: First, we simulated short interactions of liquid, granular, and non-rigid materials with rigid objects of various shapes. Then, crucially, we rendered only the target material while the remaining scene was black. We presented the videos to 100 observers and asked them to identify which of two alternative interactions showed the same target material as the test video. Match and distractor varied in their material properties (e.g., cohesion), thus implicitly requiring inference of those parameters. Observers were as accurate in judging these videos, as they were when presented with fully rendered versions. Strikingly, we found that observers did not only perceive the target material in rich detail; in most cases, they were able to select which of two alternative 3D shapes was underlying the observed interaction. This finding suggests that the brain imputes the hidden objects in a physically plausible manner. In comparison, a distance-based classifier based on features from pretrained neural networks showed overall lower performance in both tasks and the pattern of errors was different from human observers. Taken together, our results are consistent with the hypothesis that people use an internal generative physics model in online perception.