August 2014
Volume 14, Issue 10
Free
Vision Sciences Society Annual Meeting Abstract  |   August 2014
The bottleneck in human letter recognition: A computational model
Author Affiliations
  • Avi Ziskind
    Psychology Department, New York University
  • Olivier Hénaff
    Center for Neural Science, New York University
  • Yann LeCun
    Center for Neural Science, New York University
  • Denis Pelli
    Psychology Department, New York University
Journal of Vision August 2014, Vol.14, 1311. doi:https://doi.org/10.1167/14.10.1311
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Avi Ziskind, Olivier Hénaff, Yann LeCun, Denis Pelli; The bottleneck in human letter recognition: A computational model. Journal of Vision 2014;14(10):1311. https://doi.org/10.1167/14.10.1311.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Limitations of human letter recognition indicate a bottleneck in the combining of visual information to recognize a letter. Signal Detection Theory shows that the ideal observer for a signal in white noise does template matching, and its performance depends solely on the signal-to-noise ratio (SNR), independent of signal complexity. Surprisingly, human threshold SNR is proportional to letter complexity (Pelli et al. 2006), suggesting that only a limited number of features can be combined for identification. To better understand this limitation of human observers, we trained an artificial neural network to identify letters, hoping to discover what network design characteristics would make its threshold depend on complexity. We used a convolutional neural network (ConvNet), a popular multi-layer neural network architecture for object and letter recognition (LeCun et al. 1998). We created multiple sets of images consisting of a letter added to a Gaussian white-noise background, varying noise level across different sets. We used seven fonts, spanning a ten-fold range of complexity. For each font, we trained the network to identify letters, using all the noise levels together to train and then testing accuracy at each noise level to determine the threshold SNR required for 64% correct identification of new test images. With extensive resources, ConvNet has a much lower threshold than humans, and exhibits only a weak dependence on font complexity (with a log-log slope of 0.5). With restricted resources (two convolutional layers respectively containing 6 & 12 convolutional filters followed by 60 fully-connected units), the threshold of ConvNet rises to human levels (0.10 RMS error in log SNR threshold across 7 fonts) and a log-log slope of 1 (i.e. proportional). Thus we find that a ConvNet with restricted resources closely matches human thresholds for letter identification in seven fonts that span a tenfold range of complexity.

Meeting abstract presented at VSS 2014

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×