Abstract
Understanding the decision of an artificial neural network is an important task when we consider how these models are being applied to problems such as robotics, autonomous driving and other increasingly important areas where an incorrect inference can lead to costly outcomes. Recent work in the area disentangles the learned representations in order to identify key concepts that are being learned and quantifies the dependence of classes on specific feature directions. Most of these approaches focus on what the models are learning. We propose a method to understand why the models are learning these concepts and feature dependencies. We extend the ideas of feature selectivity and network dissection to identify learned class relationships and build quantifiable feature-class associations based on learned network parameters. This approach allows us to not just understand what a model is learning but to also gain insight into why it is learning these concepts and how it is using them. We apply these methods to multiple network architectures trained for different tasks and explore ways to regularize feature dependency to improve generalization.