The perception of an occluded object is part of the more general image parsing problem (Yuille & Kersten,
2006).
Figure A1 illustrates key computational steps. The left panel of
Figure A1 shows an image that can be interpreted in one of two ways. One the one hand, it can be seen as a collection of three rectangular, possibly textured or shaded, patches superimposed on a uniform gray planar background. Alternatively, it can be seen as a shaded ellipsoid behind a gray plane with three rectangular apertures. If one cross-fuses the left and right images of
Figure A1 (left and right images to the right and left eyes, respectively), the first interpretation becomes quite compelling due to the fact that stereo information provides evidence for three rectangular patches floating above a gray background, and that the surface boundaries of the three patches are intrinsic to the patches, as shown in
red in panel B1. The perceptual “explanation” of the image data (panel C1) is that of three, possibly scrambled, patches floating in the foreground (
blue outlines) above a gray plane (
orange outline). On the other hand, if one cross-fuses the two images in
Figure A2, the second interpretation becomes compelling. In this latter case, the edges of the three rectangles become intrinsic to the gray planar surface (shown in
red in panel B2), and one thus interprets the rectangular patches as holes. These rectangular edges are now extrinsic to the region inside the aperture, and because these edges are no longer bound to the internal patch regions, the regional and partial edge information (curved object fragments within the rectangles) provides candidate data to be “fit” by a single closed object, i.e., in this case, an ellipsoid. This object detection stage may involve both amodal completion (i.e., filling in of the ellipsoid based on Gestalt principles, e.g., of good continuation), as well as access to high-level familiar models, such as “ellipsoid.” The perceptual “explanation” of the image data (panel C2) is that of an ellipsoid (
blue outline) floating behind a gray plane (
orange outline). Note that both the disparity data and a high-level hypothesis about the form of the occluder can be used to “explain away” those parts of the image that do not belong to the target object (Yuille & Kersten,
2006).