Cue combination rules have often been applied to the perception of surface shape but not to judgements of object location. Here, we used immersive virtual reality to explore the relationship between different cues to distance. Participants viewed a virtual scene and judged the change in distance of an object presented in two intervals, where the scene changed in size between intervals (by a factor of between 0.25 and 4). We measured thresholds for detecting a change in object distance when there were only ‘physical’ (stereo and motion parallax) or ‘texture-based’ cues (independent of the scale of the scene) and used these to predict biases in a distance matching task. Under a range of conditions, in which the viewing distance and position of the target relative to other objects was varied, the ratio of ‘physical’ to ‘texture-based’ thresholds was a good predictor of biases in the distance matching task. The cue combination approach, which successfully accounts for our data, relies on quite different principles from those underlying traditional models of 3D reconstruction.

_{0}, at which participants entered a small ‘trigger’ zone (a tall invisible cylinder of 10 cm radius positioned half way between the side walls of the room). In the first interval, the reference square was placed directly in front of T

_{0}at a distance of 1, 3 or 5 m on a line through T

_{0}and perpendicular to the back wall. The comparison square was presented in the second interval at a distance assigned by a staircase procedure (see below). Additionally, the comparison square was given a random lateral jitter (±3, 6, 9 and 0 deg) to avoid the participant being able to solve the task from one single monocular view.

_{0}. So, the participant had to pass through a small trigger zone to initiate the display of the square and then keep within a larger viewing zone for the square to remain visible. However, a table in front of the participant provided a physical restriction. The participant was asked to keep close to the table during experiments so that in practice the range of forward and backward movement with respect to T

_{0}was small (see below).

Variable | Definition |
---|---|

σ _{ P}, σ _{ T} | Thresholds for detection of change in ‘physical’ cues ( σ _{ P}, Figure 4, column 1) or ‘texture-based’ cues ( σ _{ T}, Figure 4, column 2). The ‘physical’ cue signals the distance of the object independent of other objects in the scene (stereo and motion parallax). The ‘texture-based’ cue signals the distance to the object relative to others and is independent of the overall scale of the room. |

w _{ P}, w _{ T} | Predicted weights for ‘physical’ and ‘texture-based’ cues derived from σ _{ P} and σ _{ T}, respectively ( Equation 2). |

k _{ abs}, k _{ rel} | Fitted parameters for the data on matching perceived absolute distance ( k _{ abs}, Figure 2) and perceived distance relative to the room ( k _{ rel}, Figure 3). |

*bias*=

*k*f(

*g*) +

*c,*where

*g*is the expansion factor, f(

*g*) is the bias predicted for a pure texture-based match as shown by the dashed curve, and

*k*and

*c*are free parameters. A weighted linear least squares fit was used to find the values of

*k*and

*c*and their covariance. The standard deviation on

*k*was taken to be the corresponding value from the covariance matrix. Figure 2 shows the fits derived in this way.

*k*for judgements of perceived distance relative to the room (

*k*

_{ rel}) against values of

*k*for judgements of perceived absolute distance (

*k*

_{ abs}). For both axes, data close to zero imply the use of physical cues, and data close to 1 the use of texture-based cues. Data are clustered by viewing distance (1, 3 and 5 m), where

*k*

_{ abs}is about 0.1, 0.5 and 0.9, respectively. In almost all cases, the data lie above the line of equality, meaning that there was a shift towards using texture-based cues for the ‘relative’ task, although for most participants the difference is small. Participant S1, whose data are shown by circles, is more able to make a texture-based match in the ‘relative’ task, as we saw in Figure 3a.

*k*

_{ abs}and

*k*

_{ rel}) where a value of

*k*= 0 corresponds to the ‘physical’ cue prediction and

*k*= 1 corresponds to the ‘texture-based’ prediction. It shows values of

*k*when the target was placed close to the wall (open symbols) and away from the wall (closed symbols). All participants shifted their biases toward a more texture-based match when the target was close to the wall for both the absolute and relative matching tasks. The mean values of

*k*

_{ abs}shifted from 0.08 ± 0.002 to 0.42 ± 0.005, while the mean values for

*k*

_{ rel}shifted from 0.22 ± 0.005 to 0.83 ± 0.008.

*D*

_{2}(see Figure 4), were scaled by the ratio of the distance to the reference square and the distance to the back wall. Figure 4, middle column, shows these thresholds for detecting ‘texture-based’ cues. The results show the opposite effect compared to the ‘physical’ thresholds, with poorest thresholds for 1 m. This is likely to be because the texture-based cues were more reliable as the square moved closer to the back wall (see the Discussion).

*worse*when the target was close to the wall (black symbols above the line), which might have been due to the lateral jitter having a detrimental influence in this case.

*D*is given by:

*P*is the estimate of target distance derived from stereo/motion parallax cues,

*T*from texture-based cues and

*w*

_{P}and

*w*

_{T}are the weights applied to these estimates, respectively.

*P*and

*T,*are independent and Gaussian with variances

*σ*

^{2}

_{ P}and

*σ*

^{2}

_{ T}. Then, the weights can be estimated by the following equations:

*σ*

_{ P}and

*σ*

_{ T}respectively. Figure 4, right hand column, plots the weights,

*w*

_{ P}and

*w*

_{ T}, derived from these thresholds for each viewing distance. As we have noted, participants placed more weight on physical cues at 1 m and on texture-based cues at 5 m. At 3 m most participants had a slightly greater weighting for texture-based cues.

*b,*by fitting the function

*b*=

*k*f(

*g*) +

*c,*with

*k*and

*c*as free parameters. We can now see how well the weights,

*w*

_{ P}and

*w*

_{ T}, predict the bias data. The equation for the predicted curve is

*b*=

*w*

_{ T}f(

*g*). This can be done for both

*k*

_{ abs}which applies to the absolute task and

*k*

_{ rel}which applies to the relative task. Figure 6 shows how

*k*

_{ abs}and

*k*

_{ rel}relate to the predicted values,

*w*

_{ T}, for judgements made in the middle of the room. In both cases there is a strong correlation between the prediction

*w*

_{ T}and the best fitting

*k*value (

*k*

_{ abs}:

*r*(11) = 0.98,

*p*< 0.001;

*k*

_{ rel}:

*r*(11) = 0.94,

*p*< 0.001). The fits,

*k*

_{ abs}and

*k*

_{ rel}, span a narrower range than the predicted values,

*w*

_{ T}, for which we have no clear explanation. When the target was close to the wall, participants reported using a variety of strategies to carry out the task, leading to variability in performance and a poor correlation between predictions,

*w*

_{ T}, and fits,

*k*

_{ abs}and

*k*

_{ rel}. For comparison with Figure 3c, the mean value of

*w*

_{ T}close to the wall was 0.45 ± 0.28 and away from the wall it was 0.019 ± 0.018.

*a priori,*that cue combination rules should apply to the perception of distance but here we have shown that they do. Using only measurements of (i) thresholds for judging the physical distance of a target (independent of neighboring objects) and (ii) thresholds for judging target distance relative to surrounding objects, it is possible to make quite accurate predictions, with no free parameters, about the biases that people will make when judging the perceived distance of an object ( Figure 6). We have shown that the ability to predict biases on distance judgements holds over a range of conditions in which the reliability of physical and texture-based cues is varied, including the effect of viewing distance ( Figure 4) and proximity to other objects ( Figure 5).

*could*use feedback effectively if it indicated the location of the target object relative to the room. Intuitively, this result seems reasonable, i.e. observers should be able to judge the distance of an object relative to the room (ignoring its absolute distance). However, one of the interesting results from the experiments we report here is that, without feedback, participants do not report the relative location of the target object at all accurately. Indeed, they show almost as much ‘mandatory fusion’ of cues for this task as when they are asked to report the perceived absolute distance of the target (Figure 6b).