Abstract
Introduction. Retinal prostheses have the potential to restore vision to individuals blinded from retinal degenerative diseases. However, the quality of current prosthetic vision is still rudimentary. In this study, we combined various computer vision models with a psychophysically validated computational model of the retina (Beyeler et al., 2019) to generate simulated prosthetic vision (SPV), and investigated their effects on perceptual performance in scene understanding. Methods. 45 sighted subjects (31 females, 14 males) acted as virtual patients by watching SPV videos depicting 16 different outdoors scenes. Subjects were asked to identify if there were people and/or cars in the scene. Perceptual performance was measured as a function of four deep learning-based scene simplification strategies (highlighting visually salient information, highlighting closer pixels, segmenting relevant objects, and a combination of all three), three retinal implant resolutions (8x8, 16x16, 32x32), and nine different combinations of phosphene size and elongation. Results. Subjects were best at identifying people and cars with the segmentation algorithm (d’=1.13, sd=1.02) compared to saliency (d’=0.07, sd=.66, p<0.001), depth (d’=0.29, sd=0.77, p<.001), and combination (d'=1.01, sd=0.91, p<0.05). Higher implant resolutions (16x16: d’=0.72, sd=0.93; 32x32: d’=0.72, sd=1.06) also improved performance compared to lower resolutions (8x8: d’=0.46, sd=0.87, p<0.001). Performance with the smaller phosphene size (100 μm) was significantly better (d’=0.81, sd=1.02) than larger phosphene sizes 300μm (d’=0.6, sd=0.89, p<0.05) and 500μm (d’=0.52, sd=0.96, p<0.05). Discussion. Our results suggest the importance of considering retinal models to predict realistic prosthetic vision. Critically, highlighting objects and higher implant resolution can improve patients’ scene understanding.