Abstract
How humans recognize objects remains a contentious issue in current research on high-level vision. Here, I test the proposal by Wallis and Bülthoff (1999 Trends in Cognitive Sciences) suggesting that object representations can be learned through temporal association of multiple views of the same object. Participants first studied image sequences of novel, three-dimensional objects in a study block. On each trial, the images were from either an orderly sequence of depth-rotated views of the same object (SS), a scrambled sequence of those views (SR), or a sequence of different objects (RR). Recognition memory was assessed in a following test block. A within-object advantage was consistently observed—greater accuracy in the SR than the RR condition in all four experiments, greater accuracy in the SS than the RR condition in two experiments. Furthermore, spatiotemporal coherence did not produce better recognition than temporal coherence alone (similar or less accuracy in the SS compared to the SR condition). These results suggest that the visual system can use temporal regularity to build invariant object representations, via the temporal-association mechanism.