Perceptual systems are tasked with extracting meaningful information from patterns of sensory inputs. In naturalistic environments, these patterns may have complex distributions with features such as heavy or asymmetric tails. It has previously been shown that the perception of static surfaces uses some of this complexity to infer texture and material properties (Kingdom et al.,
2001; Motoyoshi et al.,
2007; Okazawa et al.,
2015; Portilla & Simoncelli,
2000). Here, we sought to determine whether higher-order moments are also used in the analysis of dynamic stimuli. Using well-controlled artificial stimuli, random dot kinematograms with different dot displacement distributions, we found that human observers were functionally blind to large changes in skewness or kurtosis when the mean and variance were identical to the background. This shows that, in at least some cases, the visual system discards useful sensory information to produce a compact internal representation that is limited to mean and variance. These results constrain theories of information processing in sensory systems.
Previous investigations of the relationship between dynamic stimulus statistics and perceptual experience have focused on higher-order correlations of spatiotemporal luminance patterns. These higher-order correlations arise in natural scenes, and modeling them is necessary to account for human motion perception (Hu & Victor,
2010; Nitzany & Victor,
2014). The present work differs from these previous efforts by considering the marginal statistics of individual elements that make up the random dot kinematogram. The perception of coherent motion in these stimuli is based on a representation of summary statistics (Watamaniuk et al.,
1989), which is similar to the perception of static artificial textures (Victor,
1994), peripheral areas of natural scenes (Freeman & Simoncelli,
2011), and natural soundscapes (McDermott, Schemitsch, & Simoncelli,
2013). We found that humans were not sensitive to higher-order moments in artificial dynamic “textures.” Nevertheless, it is interesting to speculate about whether textures with such statistics exist in the natural world, perhaps in the motion defined by flocks of birds, herds of livestock, or crowds of humans.
By documenting a case where the visual system ignores higher-order moments, we have shown that the brain does not, as a rule, faithfully represent the distributional characteristics of its inputs. Nevertheless, there are domains in which higher-order moments are used for perception. Therefore, the task going forward is to determine when and why that is the case. Two alternate perspectives can guide such an investigation. From one, perceptual systems typically represent the higher moments of stimulus distributions, but in some cases, as with random dot motion, those statistics are discarded. Alternatively, it may be that stimulus distributions are typically reduced to a Gaussian representation and processed in terms of their mean and variance except in special cases where dedicated mechanisms for representing higher-order moments convey particularly useful additional information. Supporting the latter perspective, we note that Motoyoshi et al. (
2007) propose a specific mechanism for the representation of luminance skewness, which may have developed because higher-order luminance statistics provide information about surface texture. Within the domain of dynamic information, it is important to note that we do not know whether our result will generalize to distributions over other aspects of translational stimuli, such as speed, or to other forms of motion, such as the dynamic textures that define optic flow.
Because we report a null result, it is natural to wonder whether larger changes in skewness or kurtosis might have been detected. While possible, we think this is unlikely because observers typically show high sensitivity for other aspects of the random dot stimulus. In our experiment, observers were above chance on trials with only a 5° change in mean direction, and direction discrimination thresholds in other paradigms can be as low as 2° (Watamaniuk et al.,
1989). There is a long history of studying motion perception using random dot kinematograms, and their mean direction, speed, and coherence have close correspondence in neuronal responses (Britten et al.,
1993). We cannot rule out the theoretical possibility that heavy-tailed motion could be detected in a stochastic stimulus created using a different parameterization or in a different class of spatiotemporal stimuli altogether. Yet even if such a stimulus could be created, it is striking that our observers were unable to detect large changes in skewness or kurtosis of the stimulus that we did use given their high sensitivity to changes in the first two moments. Therefore, our experiment represents a strong test, and refutation, of the hypothesis that the visual system always represents higher-order moments of sensory stimuli.
One challenge in using behavioral measurements to make inferences about information processing limitations is that it is often unclear at what stage potentially useful information may be lost. When behavior appears insensitive to some aspect of sensory input, it could be that sensory systems fail to preserve a representation of it or that inferential processes make poor use of that representation. In the present case, we are able to overcome this challenge by considering a formal model of motion perception (Adelson & Bergen,
1985). This model has close correspondence to the responses of direction-selective neurons in striate and extrastriate visual cortex (Albright,
1984; Maunsell & Van Essen,
1983; Movshon et al.,
1988; Rust, Mante, Simoncelli, & Movshon,
2006). When using direction-selective spatiotemporal filters that matched the spatiotemporal tuning of the primate visual system, we found that the resulting motion energy profiles contained essentially no information about the skewness or kurtosis of the dot displacement distributions. This suggests that the visual system discards information about higher-order moments early in the processing hierarchy. Therefore, we can conclude that performance was limited at the sensory representation stage rather than by suboptimal inference. This sensory limitation arises from how inputs to direction-selective cells are combined to generate a representation of spatiotemporal energy.
More elaborate models of motion processing can account for other physiological phenomena, such as neural selectivity for pattern motion (Rust et al.,
2006; Simoncelli & Heeger,
1998). These models build more complex representations from component elements that are equivalent to the motion energy profiles we estimated. Therefore, they should not be able to recover higher-order moments that are absent from the simpler representation. Nevertheless, considering models of later-stage motion processing may help to explain why performance on variance-manipulated trials exceeded what would have been expected from the motion energy profiles alone. One important factor to consider is that we examined the distribution of motion energy across an array of filters tuned to the speed of coherent motion in the background, yet increasing the variance of individual dot displacement angles will also decrease the average displacement in the mean direction. Therefore, a more complete mechanistic model of our task would likely need to represent the joint distribution of motion energy across a population of filters with varying direction and speed tuning.
In building a more complete model, it will also be necessary to consider how the observer represents and decodes momentary evidence to form a decision about the location of the odd patch. Our motion energy analysis focused on time-averaged differences in the motion energy profiles, but the observers were performing an RT task, and RTs were fastest in conditions with the largest divergence from the background motion. It is likely that the dependence of RT on the odd motion statistics can be attributed to multiple sources. Specifying a mechanistic model of odd motion detection will require answering several currently-open questions. First, does performance arise from a single strategy or from a mixture of condition-dependent strategies? Perhaps bottom-up recognition supports detection when odd motion differs strongly from the background, but top-down search must be engaged when the difference is more subtle. Second, are candidate odd patches subjected to an evidence accumulation process that integrates multiple samples of odd motion before committing to a decision? We are actively pursuing these questions to more fully understand the mechanisms that link stimulus statistics to behavior in the odd patch detection task.
If higher-order moments do exist in natural dynamic textures, the evolution of the visual system may have sacrificed the chance to see them because computing with compact representations of Gaussian statistics confers several benefits. An observer of motion is usually concerned with tracking the path of rigid bodies; indeed, assuming Gaussianity may help the visual system individuate multiple sources of motion that otherwise generate a platykurtic response in a population with broad direction tuning curves (Treue, Hol, & Rauber,
2000). More generally, Gaussian assumptions reduce processing demands for probabilistic computations because they require operations only on scalar representations of a distribution's location and width. And because probabilistic computations on Gaussian inputs produce Gaussian outputs, this architecture can simplify inferential procedures. Nevertheless, a representation that is limited to lower-order moments poses a challenge to fully-Bayesian theories of neural computation, which would require more complete representations of probability distributions that include detailed information about their tails. Therefore, our results demonstrate a constraint that can inform future theories of how sensory information is encoded and decoded when using perception to guide behavior.