Purchase this article with an account.
Nai Chen Chang, Elissa Aminoff, John Pyles, Michael Tarr, Abhinav Gupta; Scaling Up Neural Datasets: A public fMRI dataset of 5000 scenes. Journal of Vision 2018;18(10):732. doi: https://doi.org/10.1167/18.10.732.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Vision science - and particularly machine vision - is being revolutionized by large-scale datasets. State-of-the-art artificial vision models critically depend on large-scale datasets to achieve high performance. In contrast, although large-scale learning models (e.g., deep learning models such as Alexnet) have been applied to human neuroimaging data, the image datasets used on neural studies often rely on significantly fewer images - typically a few hundred due to time-constrained experimental procedures. The small size of these datasets also translates to a limited diversity of used images covered in image space. The lack of image feature diversity inherently limits the degree to which neural data can act as supervisory signals and our ability to compare model and measured neural representations. Here we dramatically increase the image dataset size deployed in an fMRI study of visual scene processing, scaling the number of images by over an order of magnitude relative to most earlier studies: over 5,000 discrete image stimuli were presented to each of four participants. We believe this boost in dataset size will better connect the field of computer vision to human neuroscience. To further enhance this connection and increase image space overlap with computer vision datasets, we include images from two standard artificial learning datasets in our stimuli: 2,000 images from COCO; 2 images per category from ImageNet (~2,000). Also included are 1,000 hand-curated indoor and outdoor scene images from 250 categories. These three image collections cover a wide variety of image types, thereby enabling fine-grained exploration into visual representations ranging from natural scenes to object categories to human interactions. The scale advantage of our dataset and the use of a slow event-related design enables, for the first time, joint computer vision and fMRI analyses that span a significant and diverse region of image space using high-performing models.
Meeting abstract presented at VSS 2018
This PDF is available to Subscribers Only