Abstract
How do we combine complex multimodal information to form a coherent representation of “what” an object is? Existing literature has predominantly used visual stimuli to study the neural architecture of well-established object representations. Here, we studied how new multimodal object representations are formed in the first place, using a set of well-characterized 3D-printed shapes embedded with audio speakers. Applying multi-echo fMRI across a four-day learning paradigm, we examined the behavioral and neural changes that occurred before and after shape-sound features were paired to form objects. To quantify learning, we developed a within-subject measure of representational geometry based on collected similarity ratings. Before shape and sound features were paired together, representational geometry was driven by modality-specific information, providing direct evidence of feature-based representations. After shape-sound features were paired to form objects, representational geometry was now additionally driven by information about the pairing, providing causal evidence for an integrated object representation distinct from its features. Complementing these behavioral results, we observed a robust learning-related change in pattern similarity for shape-sound pairings in the anterior temporal lobes. Intriguingly, we also observed greater pre-learning activity for visual over auditory features in the ventral visual stream extending into perirhinal cortex, with the visual bias in perirhinal cortex attenuated after the shape-sound relationships were learned. Collectively, these results provide causal evidence that forming new multimodal object representations relies on integrative coding in the anterior temporal lobes and perirhinal cortex.