Figure 3 summarizes the stimulus types and the desired quantities to be measured in our experiments. There are eight stimulus types. Four of them are bioptic or 2D stimuli, in which two eyes receive identical inputs, although in fact these are stimuli with textures at the depth of the display screen (but without a depth difference between the relevant and irrelevant texture). They are called, respectively, baseline, 2D
0, 2D
a, and 2D
2 a. In the baseline stimulus, the input image contains
I rel only. In the other 2D stimuli 2D
x, for
x = 0,
a, and 2
a, the input image is a 2D offset image, with an absolute offset 0 (i.e., no offset as in the original composite image
I com),
a, and 2
a, respectively (for a particular a value given in the
Methods section). The four other stimulus types, Figure
a, Ground
a, Figure
2 a, and Ground
2 a, are 3D or dichoptic stimuli, in which two eyes receive different inputs. In these terms, Figure or Ground denotes whether the task-relevant texture
I rel is the figure (foreground) or the ground (background) surface, and the subscript
a or 2
a denotes whether the absolute positional offset is
a or 2
a, respectively, in the 2D offset image to one of the eyes. Note that a Figure
x stimulus becomes a Ground
x stimulus when the images to the two eyes are swapped. The relative disparity between the two textures is always a constant value 2
a in a 3D stimulus. It is created either by an absolute offset
a in opposite directions in the two eyes, as in Figure
a and Ground
a, or by an absolute offset 2
a in only one eye (and no offset in the other eye), as in Figure
2 a and Ground
2 a. The 3D contribution would be manifested in the following three RT differences
where
x =
a or 2
a, and each RT(stimulus type) denotes the RT averaged over trials belonging to that stimulus type (see
Methods). If depth separation between
I rel and
I ir makes it easier to allocate attention to
I rel than otherwise, Δ(
x) and
δ 1(
x) should be positive. Meanwhile, since it is easier to direct attention to the foreground than background surface (Mazza, Turatto, & Umilta,
2005), RT(Ground
x) is typically (or by default) longer than RT(Figure
x) when depth perception is playing a role in the task. This figure–ground factor should make
δ2(
x) = RT(Ground
x) − RT(Ground
x) positive and make
δ1(
x) smaller than Δ(
x). However, this default figure–ground factor can be reduced by top–down attentional control as follows. In our stimuli,
Irel is always on the depth plane of the display screen, whether it is in front of or behind the irrelevant texture
Iir. Hence, subjects could force their attention to the display screen so that their attention is more likely to be on the task-relevant
Irel when the depth perception emerges from viewing the stimulus, even when
Irel is the background surface. This top–down attentional control should thus reduce RT(Ground
x), consequently making
δ1(
x) larger and
δ2(
x) smaller at the same time. The opposite effects of the top–down control on
δ1(
x) and
δ2(
x) should make their sum
δ1(
x) +
δ2(
x) less sensitive to the degree of this top–down control, and in fact
δ1(
x) +
δ2(
x) = Δ(
x) does not contain RT(Ground
x). Since different observers have different reaction speeds, or RT(2D
x), characteristic of themselves, a Δ(
x) = 100 milliseconds (ms) for RT(2D
x) = 700 ms is quite likely to indicate a more substantial 3D effect than the same Δ(
x) for RT(2D
x) = 2000 ms. Hence, we use the index
to assess the overall contribution of the 3D cue to attentional guidance. Later on, we will use
δ1(
x)/Δ(
x) to assess the top–down dependent factors and use
as another assessment of the 3D contribution.