December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
An interpretable alternative to convolutional neural networks: the scattering transform
Author Affiliations
  • Shi Pui Li
    Johns Hopkins University
  • Michael Bonner
    Johns Hopkins University
Journal of Vision December 2022, Vol.22, 3762. doi:https://doi.org/10.1167/jov.22.14.3762
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Shi Pui Li, Michael Bonner; An interpretable alternative to convolutional neural networks: the scattering transform. Journal of Vision 2022;22(14):3762. https://doi.org/10.1167/jov.22.14.3762.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Neural networks trained on large image datasets have been shown to successfully model the ventral visual stream. However, there is a lack of understanding of the features captured by these models. Here, we investigate an interpretable alternative to deep learning models called the scattering transform. Similar to convolutional neural networks, scattering transforms have a hierarchical structure with multiple layers implementing convolutions, non-linear activations, and pooling. However, instead of using learned convolutional kernels, these models use pre-defined Morlet wavelets at different orientations and spatial scales. In a forward propagation, an image is passed through a convolutional layer, followed by a modulus activation function, and this process is repeated across multiple layers in a hierarchical manner. During read-out, scattering coefficients are obtained by average pooling of the activations for each wavelet in a layer. Each scattering coefficient reflects a specific combination of orientations and spatial scales across layers, and the scope of the pooling operation determines how much local spatial information is preserved. In our analysis, we focus on global scattering coefficients from the first two layers (S1 and S2), which reflect summary statistics of contours and contour co-occurrences. We fit voxelwise encoding models of category-selective areas using these scattering coefficients. First, we find that the S2 encoding model cross-validation performance is on par with AlexNet and outperforms S1 in both scene- and object-selective areas, suggesting that S2 encodes important image statistics captured by high-level visual areas. Second, we find that perpendicular contour co-occurrences in S2 outperform parallel contour co-occurrences for explaining both the scene- and object-selective areas. This suggests that these areas are sensitive to local changes in orientation that define basic elements of shape. Our findings suggest that the scattering transform may be a powerful and interpretable alternative to deep learning models of feature representation in high-level visual cortex.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×