Abstract
In previous work we showed that specular rotating superellipsoids of varying corner-roundedness have characteristic optic flow patterns that predict observers' shininess ratings: namely, more-rounded shapes are perceived as less shiny than cuboidal shapes. However, previous behavioral results also show a strong covariation between percepts of shape and material - shiny objects- judged matte also appeared non-rigid. This suggests that material perception involves the simultaneous inference of shape and material, where material properties include both reflectivity and elasticity. In this work we investigate the computations underlying the perception of shape and material from motion.
Previous work in computer vision provides theory for estimating shape given known material properties (e.g. structure-from-motion and shape-from-specular-flow). We incorporate these results into an “analysis by synthesis” framework that postulates that the visual system has high-level models for inferring the shape of objects in matte rigid motion sequences (e.g. structure-from-motion), matte-elastic, shiny-rigid and possibly shiny-elastic sequences. We show that errors in the model fit scan be used to infer the most likely material type for the sequence. In particular, using novel measures of consistency and error of reconstructed shapes across time, we show that the pattern of fit errors, for a model assuming rigid matte objects, can be used to predict whether the object is both shiny or matte and rigid or non-rigid.
For example, an object's material and rigidity can be accurately estimated for slowly deforming matte surfaces. Interestingly, however, low curvature shiny objects generate structure-from-motion model fit errors that are more similar to non-rigid matte objects. From these results, we hypothesize that human observers may use a similar analysis-by-synthesis strategy to compute shape and material from motion. The hypothesis predicts perceptual errors on a range of motion stimuli that we compare to human judgments.
This work was supported by NIH grant EY015261. Partial support has been provided by the Center for Cognitive Sciences, University of Minnesota.