December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
Benchmarking dynamic neural-network models of the human speed-accuracy tradeoff
Author Affiliations
  • Ajay Subramanian
    New York University
  • Elena Sizikova
    New York University
  • Omkar Kumbhar
    New York University
  • Najib Majaj
    New York University
  • Denis G. Pelli
    New York University
Journal of Vision December 2022, Vol.22, 4359. doi:https://doi.org/10.1167/jov.22.14.4359
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ajay Subramanian, Elena Sizikova, Omkar Kumbhar, Najib Majaj, Denis G. Pelli; Benchmarking dynamic neural-network models of the human speed-accuracy tradeoff. Journal of Vision 2022;22(14):4359. https://doi.org/10.1167/jov.22.14.4359.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

People take a variable amount of time (0.1 - 10 s) to recognize an object and can trade speed for accuracy. Various time-constrained tasks demand a wide range of accuracy and latency. Previous work (Spoerer’20) has modeled only modest speed-accuracy tradeoffs (SATs) with a min-to-max range of merely 6% accuracy and 200 ms reaction time, a tiny fraction of the human range. Here, we collect and present a public human benchmark where we use image perturbations to adjust task difficulty and increase the accuracy range to more than 50%. Furthermore, we show that dynamic neural networks are a promising model of the SAT and capture the behavior without needing recurrence. 142 online participants categorized CIFAR-10 images with controlled reaction time. Reaction time (RT) was defined as the elapsed time between stimulus presentation and a keypress response. We ran 5 blocks of 300 trials, each with a different reaction time from 200-1000 ms and repeated the experiment with 4 different viewing conditions: color, grayscale, noise, and blur. Three networks: MSDNet (Huang’17), SCAN (Zhang’19), and ConvRNN (Spoerer’20) were trained on CIFAR-10 image classification. Using FLOPs as an analogue for human reaction time, we tested these networks by forcing them to “respond” using different amounts of computation, across all viewing conditions. We compared the three networks and humans using two metrics: accuracy range (difference between maximum and minimum accuracy when reaction time is varied) and correlation between speed-accuracy trade-off curves. MSDNet gives a better account than previous attempts without needing recurrence. When trained with noise, it shows high correlation (0.93) with human SAT. However, humans are much more flexible, with a large 51% accuracy range while the best network, MSDNet trained with noise, shows only 19%. Thus, our benchmark presents a challenging goal for future work that aims to model SAT.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×