The visual system uses multiple complementary sources of information (cues) to estimate properties of interest. Since the errors in the estimates from different cues for the same property will generally be different, a weighted average of the cues provides a better overall estimate. The most precise estimate is found when each cue's weight is proportional to its reliability (Backus & Banks,
1999; Ernst & Bülthoff,
2004; Hillis, Watt, Landy, & Banks,
2004; Landy, Maloney, Johnston, & Young,
1995; van Beers, Sittig, & Denier van der Gon,
1996). But how is this reliability known? Is it based on experience or on the information in the image at that moment? Is it determined for regions of a scene or for separate items in the scene?
What if we want to judge the slant of the surface of a textured rectangular table with a ring left on it by a glass of wine from which a bit had been spilt the previous evening? In the images reaching our eyes, the outline of the table's surface, the shape of the ring, the texture gradient, and the gradient in binocular disparities all provide information about the surface's slant. When considering the whole surface, the ring will contribute to the binocular disparity gradients, it may slightly disrupt the texture cue, and the shape of its image will provide independent information about the slant. If we are sure that all the cues, including the shape of the ring, relate to a single surface with a single slant, we can best estimate that slant by combining all the cues. The absence of discontinuities in the texture and disparity gradients may justify assuming that there is a single surface with a single slant. Combining all the cues to estimate the surface's slant means that one may end up with a different judgment of the orientation of the ring when considering it as part of the surface than one would if one were to judge its orientation independently.
It is well established that the weights given to different cues can depend on the task (e.g. Bradshaw, Parton, & Glennerster,
2000; Glennerster, Rogers, & Bradshaw,
1996; Koenderink, Kappers, Todd, Norman, & Phillips,
1996; Tittle, Norman, Perotti, & Phillips,
1998) but it is not evident that judging the slants of the ring and the table are fundamentally different tasks. Neither is it clear whether cues' weights can differ for different structures within confined regions of the visual field, because in order to do so the structures first have to be segregated. On the other hand, if the same slant cue weights are assigned for all structures within some region of space, then these weights cannot be optimized for both the ring and the surface texture.
Here, we examine how binocular and monocular cues are combined for the perception of surface slant. The reliability of slant cues depends on many factors, such as the slant angle, the viewing distance and the structure of the image (Jacobs,
2002; Knill,
1998; Muller, Brenner, & Smeets,
2007). There is some evidence that information about the reliability under the prevailing conditions is learnt from experience (Jacobs & Fine,
1999; Knill,
2007), although the reliability could also be estimated from the properties of the images at each moment (Deneve, Latham, & Pouget,
2001). In either case the reliability could be estimated for regions in space or for items within that space. In the present study we attempt to shed some light on the framework within which slant cue weights are attributed.