September 2018
Volume 18, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2018
Interpreting Visual Representations of Neural Networks via Network Dissection
Author Affiliations
  • Bolei Zhou
    CSAIL, MIT
  • David Bau
    CSAIL, MIT
  • Aude Oliva
    CSAIL, MIT
  • Antonio Torralba
    CSAIL, MIT
Journal of Vision September 2018, Vol.18, 1244. doi:10.1167/18.10.1244
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba; Interpreting Visual Representations of Neural Networks via Network Dissection. Journal of Vision 2018;18(10):1244. doi: 10.1167/18.10.1244.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The success of recent deep convolutional neural networks (CNNs) depends on learning hidden representations that can summarize the important factors of variation behind the data. However, CNNs are often criticized as being black boxes that lack interpretability, since they have millions of unexplained model parameters. In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations. The proposed method quantifies the interpretability of CNN representations by evaluating the alignment between individual hidden units and a set of visual semantic concepts. By identifying the best alignments, units are given human interpretable labels across a range of objects, parts, scenes, textures, materials, and colors. The method reveals that deep representations are more transparent and interpretable than expected: we find that representations are significantly more interpretable than they would be under a random equivalently powerful basis. We apply the method to interpret and compare the latent representations of various network architectures trained to solve different supervised and self-supervised training tasks. We then examine factors affecting the network interpretability such as the number of the training iterations, regularizations, different initializations, and the network depth and width. Finally, we show that the interpreted units can be used to provide explicit explanations of a prediction given by a CNN for an image. Our results highlight that interpretability is an important property of deep neural networks that provides new insights into their hierarchical structure.

Meeting abstract presented at VSS 2018

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×