Abstract
Vision researchers have long sought to understand a ‘neural code’ that facilitates object recognition. The typical framework supposes visual processing transforms a pixel representation of an image into a more pragmatic feature space, directions in which encode high level visual concepts. In this framework, populations of neurons represent images in a ‘complex’ way, as points on a high-dimensional manifold, while single neurons (features) encode images in a ‘simple’ way, as scalars along a single axis. The scalar encoding of single features is often intuitively puzzling; suppose we find a feature highly selective for faces, is it then sufficient to say an individual image simply has ‘2’ faciness, or ‘30’, or ‘-10’? This account obfuscates the multiplicity of reasons a given image might excite/inhibit the face feature. In the present work, we present a novel method for elucidating these reasons behind deep neural network features, opening single features to analyses typically only suitable for populations. Leveraging access to the gradients from a given feature, we can construct a feature-wise ‘image trajectory manifold’. Images will be close to each other on this manifold if they drive the feature for the same reason, i.e., it’s roughly the same sequence of computations that leads to the feature’s activation. Images will be farther apart if they take distinctive hierarchical routes to the feature, even if they invoke a similar level of activation. For example, we find many features in an Imagenet trained model have a distinct route for dog faces, adjacent to the route for human faces, and far from the route for text. We provide a plotting tool for exploring where different images land on a feature’s trajectory manifold, as well as a feature visualization tool that leverages image trajectories to accentuate the aspects of an image that drive an individual feature’s response.