Abstract
Whereas some actions are aimed at changing the world, others are aimed at learning about it. For example, someone might press on a door to open it, or to determine whether it’s locked; someone might place their toe into a pool to enter it, or to gauge its temperature; someone might shake a container to shuffle its contents, or to figure out what’s inside. The distinction between ‘pragmatic’ and ‘epistemic’ actions is recognized in other fields, but only recently entered vision science: In previous work (Croom et al., 2023), we found that, when watching videos of someone shaking a box, observers can infer what information they are trying to obtain (e.g., the number of objects inside vs. their shape). Here, we ask a broader question: Do epistemic actions share common visual features that distinguish them from pragmatic actions, even beyond particular action goals? We created a set of 216 videos, each showing a naive participant completing an epistemic action (determining the number, shape, or size of objects in a box) or a pragmatic action (shuffling the box’s contents, making the objects collide, or causing them to jump into the air). Then, 100 observers viewed these videos and were given a different task: To distinguish pragmatic actions from epistemic actions—i.e., who was acting to do something vs. to learn something. While some observers were given details about the specific actions they would see, other observers were simply told that some videos showed ‘learning’ and others showed ‘doing’. Regardless of whether they were informed (Experiment 1) or uninformed (Experiment 2) of the candidate actions, observers correctly distinguished pragmatic from epistemic actions, based purely on the box-shaking dynamics. Thus, learning looks different from doing: Beyond recognizing the particular goals of an action, observers can visually recognize epistemic vs. pragmatic intent.