Purchase this article with an account.
Satoshi Nishida, Shinji Nishimoto; A brain-mediated computational model to estimate perceptual experiences evoked by arbitrary naturalistic visual scenes. Journal of Vision 2018;18(10):423. doi: 10.1167/18.10.423.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Recent developments in decoding techniques using functional magnetic resonance imaging (fMRI) allow us to recover perceived visual and semantic contents from human brain activity [e.g., Nishida and Nishimoto, 2017, NeuroImage]. Such decoding techniques have many potential real-world applications (e.g., neuromarketing). However, the measuring cost of fMRI makes it difficult to realize many of such applications. Here, we propose a new decoding framework for estimating naturalistic perceptual experiences with no additional fMRI measurement after model construction. Our framework involves two types of computational models: One is an encoding model that predicts brain activity evoked by arbitrary naturalistic scenes using internal representations of a convolutional neural network. The other is a decoding model that estimates perceptual experiences from arbitrary brain activity using a semantic vector space. Training these models for each experimental participant requires a set of measured fMRI data while the participant viewed naturalistic movies. However, once the training has been done, the encoding model predicts brain activity evoked by any novel scenes whereas the decoding model estimates perceptual experiences from the predicted brain activity. Accordingly, the combined model does not require any additional fMRI measurements to estimate each participant's perceptual experiences regarding novel scenes. Our results showed that our model well estimated perceptual experiences evoked by novel scenes, which is consistent with the corresponding scene descriptions by human annotators. In addition, the estimation of perceptual experiences varied across participants' models. This variation was significantly correlated with the variation of scene descriptions across annotators, suggesting that the models involve individual variability of perception. Importantly, our framework can use any possible pairs of encoding and decoding models and thus can be potentially applied to many types of decoding with various modalities. Thus, our framework may dramatically improve the applicability of decoding techniques in our society.
Meeting abstract presented at VSS 2018
This PDF is available to Subscribers Only