Abstract
Many objects in the world are non-rigid when they move. To identify such objects, a visual system has to separate shape changes from movements. Standard structure-from-motion schemes use rigidity assumptions, so are not applicable to these shapes. Computational solutions proposed for non-rigid shapes require additional constraints on form and motion. Despite an enormous literature on human perception of structure-from-motion, the ability of observers to correctly infer non-rigid 3-D shapes from motion cues has not been examined.
We examined whether the human visual system could make metric judgments about simple 3-D shapes using only motion cues, and if there was a difference in performance between rigid and non-rigid shapes.
Stimuli consisted of white dots randomly placed on an opaque black horizontal cylinder on a black background. The cylinder underwent simultaneous rotation about the vertical and depth axes (it did not spin on its own axis). The elliptical cross-section of the cylinder was varied from trial-to-trial and observers reported whether the cross-section was deeper or shallower than a perfect circle. The cylinder was either rigid or flexed non-rigidly in depth or in the fronto-parallel plane. The rigid central portion of the cylinder was occluded.
Observer's judgment of cross-section circularity was generally slightly shallower than veridical. The non-rigid cylinders were judged as deeper than rigid cylinders; however, the psychometric functions had similar slopes. Rotation in depth (about vertical axis) was critical, as 3D shape was not perceived with rotation only in the fronto-parallel plane. We compared human performance with existing computational models. Akhter et al.'s trajectory basis extension (2008) of Tomasi and Kanade's factorization method (1992) yielded cylindrical shapes similar to human judgments. Koenderink's def-based motion-flow analysis (1986) yielded slants and tilts that were consistent with cylindrical shapes. The human visual system thus does not require rigidity assumptions to extract veridical 3-D shapes from motion.
NIH Grants: EY13312 & EY07556.