Abstract
Two-dimensional retinal images are ambiguous about the 3-D configurations of real-world surfaces/objects. For example, a square surface juxtaposed with an L-shaped target (L-coplanar condition) could either be perceived as a square and an L, or two overlapping square surfaces due to partial occlusion. The latter surface representation is often preferred due to the visual system's past experience of learning to use T-junction information for image segmentation that renders the L-shaped target in back, leading it to be represented as an occluded square (amodal completion). Additionally, if one renders the L-shaped target in back with uncrossed binocular disparity (L-back condition), a bottom-up depth cue, the tendency to see the L-shaped target as a partially occluded square is increased. Only if one renders the L-shaped target in front using crossed disparity (L-front condition) is the L-shaped target unambiguously seen as an L. Using such stimuli, He & Nakayama (1992) found that observers took longer to find the "L" in the L-back condition than the L-front condition in a visual search task. This is because the L-back search elements had T-junctions and binocular disparity cues, leading the visual system to interpret the L-shapes as partially occluded squares according to its internal perceptual rule. Here, we investigated if the perceptual rule (T-junction) could be modified if we compel the visual system to search for the L-shaped target (task specific learning). We did this by having observers perform roughly 15,000 trials of visual search trials with the L-coplanar condition. We found search time improved, with the L-coplanar and L-front RTs becoming similar. However, search in the L-back condition remained slower. This suggests that with task specific learning, the visual system can learn to discount T-junction information (experiential knowledge) for image segmentation, but that it is harder to discount binocular disparity information (bottom-up).
Meeting abstract presented at VSS 2017