Abstract
In natural vision, objects are always encountered in a surrounding context, and these contexts can be highly consistent (e.g., most boats appear in aquatic scenes). Previous observations suggest that the human visual system leverages the statistical associations between objects and contexts to facilitate object representation. What are these statistical associations, and how much object information can be inferred from contextual associations alone? Here, we developed a computational approach to model the statistical associations between objects and their image contexts, and we used this approach to determine if contextual information alone can explain key aspects of human object representation. Using large-scale scene datasets, we systematically occluded instances of target objects—leaving only the context intact—and passed the occluded images through an Imagenet-pretrained convolutional neural network (CNN) to obtain average “context-only” representations for a diverse set of object categories. These context-only representations reflect the information that can be learned about an object based solely on its natural image contexts. We also obtained object-only representations using a similar approach (with context occluded). We then examined two common measurements of human object processing: behavioral similarity judgements and fMRI responses to images of isolated objects without contextual backgrounds. We found that both similarity judgements and fMRI responses in visual cortex were well predicted by our context-only representations and that these effects were competitive with those of the object-only representations. The fMRI effects for context-only representations were observed in both object-selective (LO, pFs) and scene-selective (PPA, OPA) regions. These findings are striking because the stimuli in both the behavioral and fMRI experiments were isolated objects without contextual backgrounds, and yet the responses to these objects could be explained by CNN representations of their contexts alone. Together, these findings suggest that object representations in the human brain are shaped by the statistical regularities of their natural image contexts.