Abstract
The world around us is visually complex, yet we can efficiently describe it by extracting the information that is most relevant to convey. How do the properties of a real-world scene help us decide where to look and what to say about it? Image salience has been the dominant explanation for what drives visual attention and production as we describe what we see, but new evidence shows scene meaning predicts attention better than image salience. Another potentially important property is graspability, or the possible grasping interactions objects in the scene afford, given that affordances have been implicated in both visual and language processing. We quantified image salience, meaning, and graspability for real-world scenes. In three eyetracking experiments (N=30,40,40), native speakers described possible actions that could be carried out in a scene. We hypothesized that graspability would be task-relevant and therefore would preferentially guide attention. In two experiments using stimuli from a previous study (Henderson & Hayes, 2017) that were not controlled for camera angle or reachability, meaning explained visual attention better than either graspability or image salience did, and graspability explained attention better than salience. In a third experiment we quantified salience, meaning, graspability, and reachability for a new set of scenes that were explicitly controlled for reachability (i.e., reachable spaces containing graspable objects). In contrast with our results using previous stimuli, we found that graspability and meaning explained attention equally well, and both explained attention better than image salience. We conclude that speakers use object graspability to allocate attention to plan descriptions when scenes depict graspable objects that are within reach, and otherwise rely more on general meaning. Taken as a whole, the three experiments shed light on what aspects of meaning guide attention during scene viewing in language production tasks.