Abstract
Object information is transformed along the ventral visual stream from early formats into increasingly invariant representations. To gain insight into the nature of these representations, we measured different facets of similarity with three behavioral tasks and related them to neural responses.
First, we measured similarity implicitly using a visual search task, measuring the time it took participants to find one target amongst distractors in a large-scale online experiment (72 images of inanimate objects, 2,556 pairs, n=1,272). Next, we measured similarity with an explicit shape-sorting task, asking participants to arrange items in a circular arena based on their shape similarity (n=25). Then, we measured similarity with the same task, but without any guidance on what object properties to focus on while placing more similar objects nearby (n=26). Finally, brain responses to each image were obtained using functional magnetic resonance imaging (n=10).
Our results show that (i) there are clear differences in the measured similarity space of objects across tasks, (ii) the implicit visual search similarity was most strongly correlated with the posterior half of object-selective cortex, (iii) the explicit shape-sorting similarity was most strongly correlated to the anterior half of object-selective cortex, and (iv) the similarity measured with the unguided sorting task was not well-correlated with any responses along occipitotemporal cortex. These observations were confirmed quantitatively with linear mixed-effect modeling: the brain similarity structure along the hierarchy was better explained when including the interaction between behavioral task and location along the hierarchy, compared to a model without the interaction: 𝜒2(34)=178.73, p<0.001).
Broadly, these results reveal a clear dissociation in the way posterior and anterior occipitotemporal cortex represents the similarity of objects. We hypothesize that this relatively sharp transition along the ventral stream hierarchy may reflect a passage from more pictorial to more structural representations of inanimate object information.