Abstract
Studies of natural signals often focus on measuring second order statistics – the covariation between pairs of feature properties. Efforts to move beyond second order statistics typically assume particular forms of signal invariance in order to simplify the statistical measurements. However, restricting measurements to second order statistics or assuming particular forms of invariance might miss fundamental statistical structure that is crucial for characterizing natural signals. We show that it is practical to directly measure higher order statistics using the simple strategy of estimating moments along single dimensions, conditional on the values along other dimensions. Although this conditional moments approach is only practical for distributions of modest dimension, it has some unique advantages. First, univariate conditional distributions for local image properties are frequently unimodal and simple in shape, and thus the first few moments capture much of the shape information. Second, estimating conditional moments only requires keeping a single running sum for each moment, making it practical to use essentially arbitrarily large numbers of training signals (in our case over 1010) and hence to measure higher order statistics with higher precision. Third, it is relatively straightforward to specify bayesian optimal estimators from conditional moments (the MMSE estimator is the first conditional moment). Fourth, conditional moments can be measured recursively in a hierarchical fashion, allowing the approach to be extended to higher numbers of dimensions than would otherwise be practical. Third, fourth and fifth order statistics (and recursive statistics) were measured for nearby points in a large collection of calibrated natural images. These measurements reveal highly systematic statistical regularities not reported previously. The importance of this higher order structure is demonstrated by showing how it can be exploited to substantially improve image interpolation (>50% reduction in MSE over bilinear), a fundamental task in retinal decoding and in image processing.