Abstract
The organizational principles of the object space represented in the human ventral visual cortex are debated. Recently, Bao et al. (2020, Nature) proposed a characterization of object space in the occipitotemporal cortex (OTC) as a two-dimensional map of aspect ratio (stubby vs. spiky) and animacy. Here, we contrasted their model with another prominent hypothesis that in addition to an organization in terms of animacy, proposes a representation related to the distinction between faces and bodies (not related to the aspect ratio of these categories). To dissociate the latter two categories from aspect ratio, we prepared a novel set of stimuli including images of animates (face and body) and inanimates (natural and man-made) with a comparably wide range of aspect ratios in each of these four categories and investigated responses from human fMRI and deep neural networks (BigBiGAN). Results of representational dissimilarity analysis showed that object space in BigBiGAN, OTC and most category-selective regions in OTC was partially explained by animacy but not by aspect ratio. In addition, no category-selective region showed tuning representation to the aspect ratio of stimuli from their selected category, except the right object-selective cortex. Brain decoding resulted in much higher accuracy for animate-inanimate classification across changes in aspect ratio than stubby-spiky classification across changes in animacy. We also observed transitions in the representational content along the anatomical posterior-to-anterior axis in OTC. Data-driven approaches showed clusters for face and body stimuli and animate-inanimate separation in the representational space of OTC and BigBiGAN (without explicit training to represent superordinate classification), but no arrangement related to aspect ratio. In sum, the two-dimensional animacy x aspect ratio model cannot explain object representations in either occipitotemporal cortex or a state-of-the-art DNN while a model in terms of an animacy dimension combined with strong selectivity for faces and bodies is more compatible with both neural and DNN representations.