The question of what aspect of an object actually defines its apparent position has been of considerable interest for some time. There are a number of cues or ‘location tags’ which an observer can use to locate the relative position of objects within a visual scene. These include the peak of the object’s luminance or contrast distribution, points of inflexion or zero crossings in the luminance distribution, the position at which edges of the object reach threshold, and the weighted mean or centroid of the distribution. Previous studies have suggested that the most likely candidate is that of the centroid or ‘centre of gravity’ of the stimulus envelope for both luminance-defined and contrast-defined objects (
Westheimer & McKee, 1977;
Watt & Morgan, 1983;
Morgan & Aiba 1985a;
Morgan & Glennerster, 1991;
Morgan, Ward & Cleary, 1994;
Whitaker et al, 1996). The results of the present study support this assertion — the perceived location of both asymmetric luminance- and texture-defined patches was found to agree very closely with the calculated centroid position for each distribution.
When the luminance and texture components of the central patch provide conflicting positional cues, a modulation in amplitude of one type of information relative to that of the other produces a smooth change in the perceived location of the object as a whole. This indicates that a global estimate of object location is extracted after the visual system combines positional signals from both sources of visual information. In an elegant series of experiments, Rivest and Cavanagh (
1996) showed that a contour defined by one visual attribute (e.g. luminance, colour, texture or motion) could influence the perceived location of that defined by another attribute. Furthermore, they showed that combining different attributes at a common location improves the accuracy of localization. The results of
Rivest and Cavanagh’s study strongly suggest that information from different visual attributes are combined at a common neural site prior to the level at which a localization decision is reached. Inspection of
Figure 2D-F confirms that this is likely to be the case. The accuracy of relative localization for luminance or texture in isolation is very similar for both symmetric and asymmetric distributions. However, when observers are asked to locate patches composed of conflicting luminance and texture cues, localization thresholds are elevated, reaching a maximum in threshold elevation near a luminance and texture contrast value of 0.5. What might be the reason for this threshold elevation? Internal noise is likely to affect the relative salience of the two components from one trial to the next. For the symmetric condition this will have no effect since the positional cue provided by each component is in exact spatial registration. However, in the asymmetric condition, where each individual component provides a unique positional signal, trial-to-trial fluctuations in the relative strength of each signal constitute a significant additional source of variance. The localization noise for each individual component is no worse in the asymmetric condition compared to the symmetric condition, it is simply that the noise results in an increased response variance only when components signal conflicting positional estimates. A model of positional analysis which employed an early nonlinear transformation, followed by the extraction of a single positional estimate, would not contain this additional source of variance. Furthermore, the results of both Rivest and Cavanagh (
1996) and Gray and Regan (
1997), suggest that the rule for combining positional signals derived from different visual attributes is consistent with probability summation between independent channels.
Alternative potential explanations exist for the elevation in localization thresholds for stimuli consisting of conflicting luminance and texture information. One possibility involves changes in the overall stimulus profile produced by combining two individual asymmetric profiles. Morgan and Aiba (
1985b) have demonstrated that the precision with which the mean of a distribution can be extracted is dependent upon both the width of the distribution and its area. Our methodology ensured that asymmetric patches consisting of either luminance or texture alone differed from their symmetric counterparts in neither width nor area, since increases in the standard deviation of the patches on one side were counterbalanced by decreases on the other. It is reassuring, therefore, that asymmetric patches of either luminance or texture can be located with the same precision as their symmetric counterparts (
Figure 2D–F). For stimuli consisting of an asymmetric combination of luminance and texture, however, it is important to eliminate potential changes in overall width and area as contributors to the threshold elevation in this specific region (
Figure 2D–F). We therefore performed a control experiment in which alignment thresholds were measured for a combination of two asymmetric luminance profiles (each of contrast = 0.5) skewed in opposite directions, and also two asymmetric texture profiles. This allows us to directly compare performance against that for the combination of asymmetric luminance and texture profiles (shown in
Figure2D-F,
Clum=0.5). Results are shown in the table below.
For each observer, localization performance for combinations of the same type of information (i.e. luminance + luminance or texture + texture) are similar, and comparable with thresholds for the symmetric conditions (
Figure 2D–F). Thresholds for the combination of disparate sources of information (luminance + texture) are consistently higher, indicating that this threshold elevation reflects a true cost of disparate cue combination.
It might be argued that the threshold elevation is a result of a reduction in contrast of the individual luminance and texture components, i.e. at the extremes of
Figure 2D–F (luminance contrast of
Clum = 0 and 1) either the luminance or texture component is at maximum contrast, whilst in the region of greatest threshold elevation both components are present at half their maximum contrast levels. However, if threshold elevation were a result of reduced individual component contrast, one would expect the same threshold elevation for the symmetric condition. This proves not to be the case, and indicates that the elevation in localization thresholds is likely to be a direct result of combining disparate sources of visual information.
Perceived alignment for patches composed of competing luminance and textural cues was obtained when the physical contrast of the luminance component was approximately equivalent to that of the texture component. This might seem to suggest that both components play an equivalent role in dictating the perceived position of the overall patch. However, this would only be the case if the visual system were equally adept at detecting the presence of luminance and textural information. In order to examine the role of visibility we measured threshold detection for asymmetric luminance and texture patches alone. Detection thresholds for both attributes were then used to express the point of perceived alignment (i.e. where no offset is perceived) as a multiple of its respective detection threshold. The results are presented in
Table 1. A much larger multiple of luminance detection threshold is required to balance textural information when the two sources provide conflicting positional cues. It follows that, if both luminance and texture components were presented at an equal multiple of their detection thresholds, then the perceived location of the entire patch should appear offset in the direction of the textural component, which is indeed the case. An asymmetry in perceptual weights might be taken to indicate that the visual system does not treat all attributes equally but rather primacy is given to textural information over luminance information. This is in contrast to previous reports suggesting that luminance information was the dominant attribute in contour localization tasks (
Livingstone & Hubel, 1988;
Grossberg & Mingolla, 1985).
Rivest and Cavanagh (
1996) reported that when different attributes are combined at a single location, each providing concordant information, localization thresholds improve by an equivalent and statistically predictable amount as each attribute is added. This implies an equal role for each visual attribute. On the other hand, evidence for the unequal weighting of visual attributes has been suggested previously (
Landy, Maloney, Johnston & Young, 1995;
Mather & Smith, 2000;
Landy & Kojima, 2001). For example, in the localization of texture-defined edges,
Landy (
1993) presents a model in which separate location estimates are made for each visual attribute, the attributes themselves are then weighted, and the overall location is derived from the average of the weighted attributes. Within this framework, there is scope to assign larger weights to estimates derived from the particular visual cues that are the most robust and thus provide the most reliable estimate of the edge location. For example, in regions of a visual scene that contain little or no textural information preference might be given to more abundant visual attributes. Therefore, the final weighting of visual attributes is likely to be a product of the reliability of a particular cue and its availability. The quality of information provided by a visual attribute can vary not only from location to location but also over time, and the visual system needs to be able to accommodate such dynamic changes. However, the question of
how the visual system weights different attributes remains.
Landy et al. (
1995) suggest that the weighting factors are derived from subsidiary cues which in isolation do not aid edge localization but do comment directly on the reliability of information provided by a particular attribute.
The results of the present study show that the visual cortex is able to effortlessly integrate disparate sources of visual information to form a global estimate of object position, although this conflation of visual attributes results in a modest loss of localization accuracy. Analogous effects have been reported in the motion domain, where the integration of luminance and chromatic information results in either enhancement or disruption of the motion percept depending on whether each attribute conflicts or concurs (
Cavanagh, Arguin, von Grünau, 1989;
Morgan & Ingle, 1994;
Edwards & Badcock, 1996). Mis-matches between luminance and texture information are commonplace in the real world, where textured objects often vary in luminance across their surface, as a result of shadows or changes in illuminant position. The results of the present study suggest that texture information may be a more potent indicator of object position, implying that the human visual system gives more weight to visual attributes that are reliably related to the contours of objects. It is likely that visual experience plays an important role in shaping the weighting map of visual attributes, and conceivable that this weighting might be modified in different visual environments. Consider for example the mottled illumination of the forest floor. Textural difference between foliage can be small, and local luminance can change dramatically due to shadows, introducing luminance ‘noise’ to the scene. In such an environment, chromatic differences or colour cues, which are not subject to the same variability, may be particularly important. Therefore, the weighting map of visual attributes might reflect the evolutionary pressures imposed by the visual environment.