Abstract
Convolutional neural networks (CNNs) are powerful computational tools for understanding visual cognition, including human gaze behavior (O’Connell et al., 2018). Traditionally, CNNs are designed to model visual images that typically represent a single field-of-view. Yet, the naturalistic visual world extends 360° around us, and information often extends beyond one discrete field-of-view to surrounding areas of the environment. Recent advances in virtual reality (VR) head-mounted displays, integrated with in-headset eyetrackers, provide the ability to emulate the natural 360° environment with greater stimulus control while simultaneously recording participant’s gaze. Here, we illustrate a method of leveraging any existing, pre-trained CNN to provide inferences about 360° naturalistic scenes, and demonstrate the application of these omnidirectional CNN activation maps in providing insight about gaze behavior in immersive virtual environments. First, single fields-of-view are evenly sampled from the equirectangular projection of the omnidirectional image and spherical distortions are removed through gnomonic projection. Next, CNN activation maps are generated for each field-of-view, and CNN layer activations in response to these images are recombined to create an aggregate equirectangular image. Finally, the aggregated equirectangular image is smoothed using a variable width gaussian kernel to account for projection distortions to produce maps. In a set of 20 individuals, we demonstrate comparisons of 360° gaze behavior with CNN layer activations of image salience (SalNet; Pan et al., 2016), memorability (AMNet; Fajtl et al., 2018), and semantically meaningful image features (VGG; Simonyan et al., 2014). All in all, we outline a method for extending existing CNN activation maps to omnidirectional images, and show the utility of these maps for predicting gaze behavior in real-world, omnidirectional environments.