Abstract
Primates must accurately estimate the size of objects in their environment to interact with them efficiently. Hong, Yamins, Majaj, and DiCarlo (2016) reported that one could accurately approximate an object’s size within an image from the population activity across the macaque inferior temporal (IT) cortex upon brief (100 ms) image presentations. These neural predictions were consistent with human behavioral estimates of object size within the same images—suggesting a linear IT readout model as the leading neural decoding hypothesis for object size estimation in primates. However, perceived and image-based (i.e., retinal) object sizes were highly correlated in the Hong et al. (2016) study. Notably, two objects with identical retinal sizes may be perceived to differ in size when embedded at different locations along a linear perspective (Ponzo illusion). Therefore, this size illusion allows us to perform a stronger test and assess whether the IT-based linear readout model predicts the perceived or retinal size. We created a set of image pairs by placing objects "near" or "far" with respect to a linear perspective background. We performed large-scale neural recordings (2 Utah arrays; n=192 sites) across the macaque IT cortex while the monkey fixated the images for 100 ms. Extending the results of Hong et al. (2016), we observed that approximations of object sizes from the IT responses (~190-205 ms) showed a significant bias (“far”-”near”; Δ=22%; p<0.001) that is qualitatively similar to that measured behaviorally in humans. Interestingly, however, most deep convolutional neural network (DCNN) models that so far best approximate the primate IT responses failed to demonstrate such a bias. Together, our results provide further support for the linear IT readout model of object size perception while exposing a significant explanatory gap in current DCNNs as models of primate vision.