Abstract
Feedforward deep neural networks have become the standard class of models in computer vision. Yet, they possess a striking difference relative to their biological counterparts which predominantly perform “recurrent” computations. Why do biological neurons evolve to employ recurrence pervasively? In this work, we show that on challenging visual tasks requiring the integration of long-range spatial dependencies, a recurrent network is able to flexibly adapt its computational budget during inference and generalize within-task across difficulties. We contribute a recurrent module we call LocRNN that is based on a prior computational model of biological vision using local recurrent intracortical connections with interneurons (Li, Z., 1998. A neural model of contour integration in the primary visual cortex. Neural Computation, 10(4), pp.903-940). LocRNN learns highly accurate solutions to the challenging visual context integration problems of Mazes and PathFinder that we use here, achieving the overall best performance across the two tasks with 3 difficulty levels each. More importantly, it is able to flexibly use less or more recurrent iterations during inference to zero-shot generalize to less- and more difficult instantiations of each task without requiring extra training data. Our observed extrapolation performance gains lie in the range of 15% to 40% on Mazes and PathFinder by varying the number of recurrent iterations, a potential functional advantage of recurrence that biological visual systems capitalize on. Our ablation study of LocRNN highlights the fundamental importance of interneurons, piecewise linear activation functions, and recurrent gating. Our work encourages further study of the role of recurrence as an important biological mechanism underlying domain generalization and task extrapolation.