October 2020
Volume 20, Issue 11
Open Access
Vision Sciences Society Annual Meeting Abstract  |   October 2020
Comparing Word Recognition by Humans and Deep Neural Networks
Author Affiliations & Notes
  • Elena Sizikova
  • Carol Long
  • Omkar Kumbhar
  • Najib Majaj
  • Denis Pelli
  • Footnotes
    Acknowledgements  Moore Sloan Foundation, NIH grant R01 EY027964 to DGP
Journal of Vision October 2020, Vol.20, 1489. doi:https://doi.org/10.1167/jov.20.11.1489
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Elena Sizikova, Carol Long, Omkar Kumbhar, Najib Majaj, Denis Pelli; Comparing Word Recognition by Humans and Deep Neural Networks. Journal of Vision 2020;20(11):1489. https://doi.org/10.1167/jov.20.11.1489.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

We compare word recognition by deep neural networks (DNN) and humans, asking whether the effects of increased pooling in the network can model crowding in human vision. We focus our experiments on a "Convolutional Recurrent Neural Network" CRNN [Shi et al. 2016], a popular model for word recognition. We study efficiency and crowding of the network on word recognition. To measure efficiency, we assess the network's performance in recognizing random 4-letter words in mono-space font at various contrast levels on a white noise background. We find that the network has a lower efficiency than the human observer: in our experiments, we found that the network has roughly one tenth of the 3\% efficiency that the humans attain[Pelli et al., 2003]. Letter crowding in human vision results in a minimum threshold spacing, independent of letter size. Crowding is usually explained as inappropriately large pooling for the task at hand. We studied how the network's size and spacing thresholds would be affected by changing its pooling from 2 to 32. The network with modified pooling was trained as specified by the original authors. We measured word recognition accuracy as a function of letter size and spacing. For humans tested at any given eccentricity, there are two regimes, one limited by crowding, and one limited by acuity[Song et al. 2014]. In the crowding regime, the threshold size is inversely related to spacing ratio. In the spacing regime, the threshold is independent of the spacing ratio. In the network, our manipulation revealed only one regime for all pooling values: a slope of -0.3 for a log-log plot of acuity vs spacing ratio, unlike the human data, which has slopes of -1(crowding limited) and 0(acuity limited). Based on these results, we believe that there are important limitations in how well this network models human reading.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.