Abstract
What are the neural mechanisms that group visual features into coherent object percepts? Association fields, mediated by long-range horizontal connections, have been shown to dynamically configure the neural response in early visual areas to form objects from collinear line segments. We propose that such association fields also exist in higher visual areas and contribute to object-based grouping and attention. To test this hypothesis, we modeled the connection strengths in the association fields by measuring the similarity between the local image features from a transformer-based vision model. We then tested the effectiveness of these object-based associations using a well-established grouping task—a two-dot paradigm. In this task, the model needs to determine whether a central and a peripheral dot are on the same or different objects in a natural scene. Our model performs this grouping task by gradually spreading attention, mediated by the association field, from the two dot locations to the neighboring areas. We observed remarkable performance in attention staying within the object while spreading, showing for the first time the plausibility of attention spread through horizontal connections as an object grouping mechanism in scenes. The model reaches a 'same-object' decision when two segments show a sufficient level of agreement in their feature representations, according to a predefined threshold. We observed a significant correlation between the time taken by the model to arrive at its decision and the actual human reaction time in the same task (72 participants for 1020 trials; r = 0.32, p < 0.001), significantly closing the gap between the baseline models and the subject-subject agreement (r = 0.42). In this work, we hypothesize and provide evidence for how the existence of object-based association fields can mediate the spread of attention to group objects in natural scenes providing novel hypotheses to be tested in neuroscience.