Abstract
A number of recent optimal statistical analyses of natural images (e.g. PCA, ICA, sparse-coding) have extracted filters that resemble receptive fields of cells in the striate cortex. We examined whether optimal statistical methods could extract higher-level filters matched to the orientation flows and frequency gradients that constitute monocular cues to 3-D shapes.
For the set of test images we used perspective projections of upright surfaces that convey 3-D percepts of concave, convex, left-slanted, right-slanted, and flat fronto-parallel objects folded from 16 patterns out of the Brodatz set of textures. Using pixel-based statistical analyses, sets of images from the same 3-D shape but different textures could not be reduced to lower-dimensional linear sub-spaces, nor could sets of images from the same texture but different 3-D shapes. Hence a bilinear decomposition into shape and texture components is not possible.
As an alternative we have explored the use of kernel-based methods on orientation and frequency decompositions of the images, as could be provided by the outputs of striate cortex neurons. The decompositions were based on convolutions with filters at four harmonic frequencies covering texture variations, and from 16 to 64 orientations. Linear reductions (e.g. PCA) of the high-dimensional space of the convolution images were unsuccessful at approximating desirable features. A major problem is extracting orientation maps that capture the subtle changes that constitute diagnostic orientation-flow patterns. However, non-linear reductions of dimensionality are more successful at producing filters that can be used for shape classification, but different reduction rules lead to different filters. Each non-linear reduction represents a different learning process, and we will present our results in this manner. Our results demonstrate one advantage for expanding the dimensionality of a problem in reaching an optimal solution.