Abstract
A tremendous amount of work on visual attention has helped to characterize *what* we attend to, but has focused less on precisely *how* and *why* attention is allocated to dynamic scenes across time. Nowhere is this contrast more apparent than in multiple object tracking (MOT). Hundreds of papers have explored MOT as a paradigmatic example of selective attention, in part because it so well captures attention as a dynamic process. It is especially ironic, then, that nearly all of this work reduces each MOT trial to a single value (i.e. the number of targets successfully tracked) -- when in reality, each MOT trial presents an experiment unto itself, with constantly shifting attention over time. Here we seek to capture this dynamic ebb and flow of attention at a subsecond resolution, both empirically and computationally. Empirically, observers completed MOT trials during which they also had to detect sporadic momentary probes, as a measure of the moment-by-moment degree of attention being allocated to each object. Computationally, we characterize (for the first time, to our knowledge) an algorithmic architecture of just how and why such dynamic attentional shifts occur. To do so, we introduce a new 'hypothesis driven adaptive computation' model. Whereas previous models employed many MOT-specific assumptions, this new approach generalizes to any task-driven context. It provides a unified account of attention as the dynamic allocation of computing resources, based on task-driven hypotheses about the properties (e.g. location, target status) of each object. Here, this framework was able to explain the observed probe detection performance measured at a subsecond resolution, independent of general spatial factors (such as the proximity of each probe to the MOT targets' centroid). This case study provides a new way to think about attention and how it interfaces with perception in terms of rational resource allocation.