October 2020
Volume 20, Issue 11
Open Access
Vision Sciences Society Annual Meeting Abstract  |   October 2020
Application of Deep Neural Networks to Model Omnidirectional Gaze Behavior in Immersive VR
Author Affiliations
  • Thomas L. Botch
    Dartmouth College
  • Erica L. Busch
    Dartmouth College
  • Caroline E. Robertson
    Dartmouth College
Journal of Vision October 2020, Vol.20, 1399. doi:https://doi.org/10.1167/jov.20.11.1399
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Thomas L. Botch, Erica L. Busch, Caroline E. Robertson; Application of Deep Neural Networks to Model Omnidirectional Gaze Behavior in Immersive VR. Journal of Vision 2020;20(11):1399. https://doi.org/10.1167/jov.20.11.1399.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Convolutional neural networks (CNNs) are powerful computational tools for understanding visual cognition, including human gaze behavior (O’Connell et al., 2018). Traditionally, CNNs are designed to model visual images that typically represent a single field-of-view. Yet, the naturalistic visual world extends 360° around us, and information often extends beyond one discrete field-of-view to surrounding areas of the environment. Recent advances in virtual reality (VR) head-mounted displays, integrated with in-headset eyetrackers, provide the ability to emulate the natural 360° environment with greater stimulus control while simultaneously recording participant’s gaze. Here, we illustrate a method of leveraging any existing, pre-trained CNN to provide inferences about 360° naturalistic scenes, and demonstrate the application of these omnidirectional CNN activation maps in providing insight about gaze behavior in immersive virtual environments. First, single fields-of-view are evenly sampled from the equirectangular projection of the omnidirectional image and spherical distortions are removed through gnomonic projection. Next, CNN activation maps are generated for each field-of-view, and CNN layer activations in response to these images are recombined to create an aggregate equirectangular image. Finally, the aggregated equirectangular image is smoothed using a variable width gaussian kernel to account for projection distortions to produce maps. In a set of 20 individuals, we demonstrate comparisons of 360° gaze behavior with CNN layer activations of image salience (SalNet; Pan et al., 2016), memorability (AMNet; Fajtl et al., 2018), and semantically meaningful image features (VGG; Simonyan et al., 2014). All in all, we outline a method for extending existing CNN activation maps to omnidirectional images, and show the utility of these maps for predicting gaze behavior in real-world, omnidirectional environments.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.