Abstract
Challenging the visual system with incongruent binocular information reveals the physiological constraints of visual perception. One type of incongruence is the temporal delay between corresponding binocular frames in a stereoscopic condition. With the recent invasion of 3D technology, this question has also become practical. Interocular delays also play a role in several eye diseases, causing perceptual distortions such as the Pulfrich effect. Although the tolerance of the visual system for interocular delays during stereo perception has been extensively studied, results are extremely diverse, almost incommensurable. While disparity-sensitive neurons in V1 display <20 ms time window of stereo fusion between binocular signals, psychophysical studies demonstrated a 30–100 ms delay tolerance. Here we asked human observers to rate the 3D quality of percept after watching 3D movies under various binocular delay and speed conditions. The delays were varied between 0 and 2 s in 33 ms increments, and the movies were viewed at six different speeds, randomized. We found that for natural scenes the visual system is able to merge frames across 500 ms (maximum 2 s), a much longer delay than reported earlier. We computed various image statistics to make sure that the average pixel lifetime was a fraction of 500 ms, thus corresponding pixels changed their hue and luminance asynchronously within intervals while 3D percepts were maintained. In order to explain the sustained 3D experience despite long delays we considered a number of factors, such as top-down effects, anticipation, inhomogeneity of motion vectors in space and time, attention and eye dominance. The flexibility of the visual system in buffering parallel visual streams over 500 ms and the ability of merging them despite the mismatch between simultaneous frames calls for a revision of the bottom-up models of 3D perception and suggests a mental editing capacity yet to be explored.