Abstract
Visual feature detectors that are useful for high-level semantic tasks must often be invariant to differences in the input space, but how such invariant feature detectors are constructed through image-computable operations is a fundamental and poorly understood challenge. Deep convolutional neural networks have the potential to provide insight into this puzzle, as invariant feature tuning often emerges in the latent spaces of such networks — but, how? Here we present a novel pruning method we call 'feature splitting', which can split a single CNN feature into multiple, sparse subnetworks, each of which only preserves its tuning response to a selection of inputs. We focus on polysemantic units, which respond strongly and selectively to seemingly unrelated semantic categories (e.g., monkey faces and written text) as a case study for splitting a feature across its invariance structure. While a few examples of polysemantic units have been characterized in DNNs, here we develop a data-driven method for identifying polysemantic units in the network. Then, we extract multiple sparse subnetworks, each of which only preserves the feature’s response to a targeted subset of image patches (e.g., to monkey faces, or to written text). In such instances, we find our feature-splitting algorithm returns highly separable subnetworks, with few shared weights between them. These findings indicate that the tuning of polysemantic units draws largely on highly distinct image filtering processes, acting as an ‘or’ gate by summing the outputs of these processes. Broadly, these feature-splitting methods introduce a principled approach for dissecting a wide range of invariance structures necessary for high-level feature detection (e.g. over units that respond to both profile and frontal views of faces, or to objects presented at different scales in the image), isolating the separable and shared computations underlying invariance.