September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
Enhancing simulated prosthetic vision with deep learning–based scene simplification strategies
Author Affiliations
  • Aiwen Xu
    University of California - Santa Barbara
  • Nicole Han
    University of California - Santa Barbara
  • Sudhanshu Srivastava
    University of California - Santa Barbara
  • Devi Klein
    University of California - Santa Barbara
  • Michael Beyeler
    University of California - Santa Barbara
Journal of Vision September 2021, Vol.21, 2308. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Aiwen Xu, Nicole Han, Sudhanshu Srivastava, Devi Klein, Michael Beyeler; Enhancing simulated prosthetic vision with deep learning–based scene simplification strategies. Journal of Vision 2021;21(9):2308.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Introduction. Retinal prostheses have the potential to restore vision to individuals blinded from retinal degenerative diseases. However, the quality of current prosthetic vision is still rudimentary. In this study, we combined various computer vision models with a psychophysically validated computational model of the retina (Beyeler et al., 2019) to generate simulated prosthetic vision (SPV), and investigated their effects on perceptual performance in scene understanding. Methods. 45 sighted subjects (31 females, 14 males) acted as virtual patients by watching SPV videos depicting 16 different outdoors scenes. Subjects were asked to identify if there were people and/or cars in the scene. Perceptual performance was measured as a function of four deep learning-based scene simplification strategies (highlighting visually salient information, highlighting closer pixels, segmenting relevant objects, and a combination of all three), three retinal implant resolutions (8x8, 16x16, 32x32), and nine different combinations of phosphene size and elongation. Results. Subjects were best at identifying people and cars with the segmentation algorithm (d’=1.13, sd=1.02) compared to saliency (d’=0.07, sd=.66, p<0.001), depth (d’=0.29, sd=0.77, p<.001), and combination (d'=1.01, sd=0.91, p<0.05). Higher implant resolutions (16x16: d’=0.72, sd=0.93; 32x32: d’=0.72, sd=1.06) also improved performance compared to lower resolutions (8x8: d’=0.46, sd=0.87, p<0.001). Performance with the smaller phosphene size (100 μm) was significantly better (d’=0.81, sd=1.02) than larger phosphene sizes 300μm (d’=0.6, sd=0.89, p<0.05) and 500μm (d’=0.52, sd=0.96, p<0.05). Discussion. Our results suggest the importance of considering retinal models to predict realistic prosthetic vision. Critically, highlighting objects and higher implant resolution can improve patients’ scene understanding.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.