Abstract
Humans can recognize objects that are presented sequentially, translating them behind a narrow slit (anorthoscopic perception). This implies that never the whole object is visible at the same time. This capability is strongly impaired if the information provided by the individual slit views is presented in randomized temporal order. Electrophysiological data from area IT helps to narrow down the possible underlying computations. METHODS: We demonstrate that standard deep neural network models for visual processing fail at this task. We present a novel deep model that recognizes anorthoscopically presented body shapes, and also reproduces properties of IT neurons during anorthoscopic perception (Bognar & Vogels, 2021). The initial levels of this model are imported from the VGG16 architecture, trained on ImageNet. The intermediate levels are formed by special local nonlinear recognition units, which assess the similarity of features that are highly visible in training and test stimuli, followed by holistic fragment recognition units that integrate information within large receptive fields. Position-invariance is accomplished, combining weight sharing with maximum pooling of the holistic detector responses. A winner-takes-all output layer integrates the information across all fragment unit outputs that belong to the same body shape. RESULTS: The model recognizes shapes from sequentially presented bodies through a slit. It also reproduces the several properties of IT neurons: (i) shape-selective neural responses to the full figure as well as to presentations through a slit; (ii) invariance against forward vs. backward motion, but strong degradation for randomly presented slit views; (iii) partial transfer between activation patterns for vertical and horizontal slit views. CONCLUSION: While classical NN models fail to account for anorthoscopic perception, an integration of mechanisms that prevent interference between slit and object features allows the construction of physiologically plausible models for this visual function.