Abstract
Artificial neural networks are proving useful for understanding hierarchical visual processing, as we gain direct access to all of a model’s internal workings, including not only the activation of every unit, but also the weights connecting these units. A new frontier of ‘mechanistic interpretability’ has emerged in machine learning, which seeks to understand not only what features are represented by the network, but how complex features are built from simpler ones through these connections. As in biological visual research, these approaches start by identifying interpretable features in the latent layers based on maximally activating images, such as ‘edge-detectors’, ‘curve-detectors, ‘object-parts’ etc. Considering the weights between layers then yields a compositional explanation of feature construction —e.g. a rabbit detector is built from legs, ears and eye detectors; an eye from curves, a curve from line segments, etc. However, here we argue this ‘composition-by-parts’ account is fundamentally incomplete, because it fails to incorporate the role of inhibitory operations. Negative weights constitute half of the learned weights in a typical deep neural network. Where inhibition is conventionally treated as analogous to excitation —for example, by considering maximally inhibiting images — we show that the asymmetry introduced by the non-linear activation function (ReLU) necessitates distinct computational roles for excitation and inhibition. We put forward a theory of the ‘inhibitory feature surround’, in which inhibition enables the construction of diverse features with selective excitatory responses. We validate our account with a series of ‘virtual lesioning’ experiments on inhibitory connections. Lastly, we introduce a feature visualization technique designed to target a feature’s inhibitory surround specifically, to help researchers understand the role of inhibition in particular cases. Broadly, these results provide clarity into the functional role of inhibition in deep neural network models, and offer a framework for empirical tests of inhibitory function in biological visual systems.