Abstract
Human vision is fundamentally a big data process in terms of both image content and neural data. Yet, MEG neuroimaging studies are typically conducted with small numbers of visual stimuli, which may not support generalization because they severely undersample the stimulus space. Here, we presented a set of 4,916 unique images, which contained 1000 images of indoor and outdoor scenes, 2000 images of multiple objects, and 1,916 images of singular objects (https://bold5000.github.io/). We recorded MEG data while a participant viewed these images over the course of 10 sessions (with overlap of images). We divided these images into four categories: faces vs objects, large scenes vs small scenes, multiple objects vs single object, and moving (action) vs static. Each subcategory had ~400-1000 images. We then performed time-resolved decoding using a linear support vector machine classifier to estimate the time series with which categorical content emerges in the human brain. Decoding results were robust, reaching 100% accuracy as early as 100-130ms from the onset of the stimuli for all categories, excluding the action vs. static category that yielded relatively weaker decoding results. Decoding time series for most categories remained near 100% for an extended period of time until 700ms after stimulus onset. Overall, our results indicate that decoding several visual categorical representations with MEG data is possible even with very large numbers of diverse naturalistic image stimuli. Our findings pave the way to future studies that will explore critical dimensions of scene processing in the human brain (geometry layout, large/small, crowded vs. sparse, and visual associations in general) using the same diverse data set of 4,916 images.