The differences between overcomplete ICA and non-negative sparse coding may be attributed to the different computational strategies of the two models: overcomplete ICA has an explicit linear forward transform, whereas non-negative sparse coding has an implicit nonlinear forward transform. The strategy of ICA is to find a set of filters whose responses to images are as independent as possible, assuming a linear transform (note that independence is not guaranteed, because of the existence of higher-order couplings that cannot be removed by linear transformations, and by overcompleteness). Non-negative sparse coding, by contrast, does not learn filters, but a dictionary of basis functions that optimally reconstructs images with a linear combination of a few (depending on the regularization coefficient) of its basis functions. However, the different objectives of overcomplete ICA and non-negative sparse coding were not designed to classify images or infer unseen image information respectively. Roughly independent filter responses are not obviously better than sparse coding responses for image classification, and the advantage was not large compared to the best non-negative sparse coding configuration. For the image inference task, image reconstruction error grew as the regularization coefficient increased, but the reconstruction error for the original unmodified image decreased despite having the opposite effect on the input reconstruction error. In other words, the reconstruction error for the image without the deletion in the V1 complex stage was better for a regularization coefficient of 0.5 than 4.0, but when the deletion was present the opposite was true. The better reconstruction of the original image can be seen qualitatively in
Figures 15 and
16 where 1×1 or 2×2 input regions were deleted. For example, consider the back-bumper of the car in row 6 of
Figure 15; if the model were simply performing reconstruction, the missing bumper region (blank space) of the car would have been reconstructed, but the model introduces new information into the image representation via its basis functions. The difference in performance on the image tasks was not an obvious result of the difference in loss functions, and previous patch completion (in-painting) results like that of
Mairal et al. (2009) only investigated inference in a single-layered model with smaller receptive fields and one level of sparsity. Here, the result of inference could be seen in single patches rather than reconstructing an entire image from its constituent image patches, and, more important, the result across various levels of sparsity was also demonstrated. Also, another difference between sparse coding and ICA is that while ICA may be thought of as similar to sparse coding in the complete case, ICA tends to maximize coherence (redundancy) in its filter matrix when extended to the overcomplete case (
Livezey et al., 2019). In maximum-likelihood inspired ICA models, this factor is usually addressed by adding a coherence control to the loss function. Score matching ICA was incorporated in this model, similar to
Hosoya and Hyvärinen (2015), and although coherence control was not explicitly enforced, score matching provides an implicit, albeit data dependent, form of coherence control.