Abstract
Predictions of attention control are typically formalized as priority maps, and many types have been proposed (e.g., saliency maps). We add to this list by introducing a reward map, a map of the reward expected by fixating different locations in an image. In the context of goal-directed attention, our premise is that achieving a goal is rewarding and that goal-directed fixations are controlled by computing expected reward and attempting to maximize its receipt. We obtained reward maps for the COCO-Search18 target-object categories using Inverse-Reinforcement-Learning (IRL), which learns from observations of search fixations (in the training data) a reward function for predicting the scanpaths of fixations made by people searching scenes for categories of target objects. Reward maps are therefore priority maps that are reverse-engineered from the training fixations using IRL. From this purely data-driven approach, we found that reward maps explained the combined variability of saliency maps, target maps, object maps, and meaning maps in predicting target-present and target-absent search behavior, thus supporting our hypothesis that the pursuit of expected reward is the common thread stitching together these other attention biases. Moreover, scanpath predictions from reward maps approached the noise ceiling imposed by agreement in participant behavior (meaning there is little room for improvement) and came close to achieving state-of-the-art (SOTA) against scanpath-prediction models in computer vision. This SOTA, however, is a black box, whereas predictions from reward maps are highly interpretable as reward. We conclude that goal-directed attention control can be understood as seeking out expected goal-related reward, and that a reward map may be THE priority map—the common priority representation into which other bottom-up and top-down biases collectively exert their control over behavior. Our work also enables scanpath prediction researchers to weigh interpretability benefits against the (often negligible) performance costs incurred by making biologically plausible modeling decisions.