There is a well-established literature of the expected performance benefits of combining statistically independent visual cues based on signal detection theory (Green & Swets,
1966; Landy et al.,
1995; Shimozaki et al.,
2003; Trommershauser et al.,
2011). For thoroughness we outline the derivation of the relationships between single cue and multiple cue performance for our assessment of the combination of contextual cue information. We assume that each scene will elicit an internal response in the observer,
xcue, the mean of which is dependent upon whether the target is present in the image and subject to internal noise. The internal response of an observer to a target absent (noise) and target present (signal) scene can be represented by Gaussian probability distributions. The observers' ability to discriminate between target-present and target-absent images is represented by the index of sensitivity,
d′, and is equivalent to the difference in mean internal responses to the signal and noise response distributions, divided by the standard deviation of the of the internal response, assumed to be equal for the signal and noise distributions (
Equation A1).
For simplicity, we assume that the mean internal response to a target absent image containing a given cue, <
xcue,
n> is zero. We assume that the mean of the internal response distribution to a target varies with the presence of different cues (due to covert or overt attention mechanisms) and is given by, <
xcue,
s>, and an associated target detectability,
d′
cue. When multiple cues are present in the image, we assume that the internal responses from various cues are combined optimally assuming statistical independence. In such case the optimal linear integration of internal response becomes (
y) a weighted linear sum of the internal responses to the individual cues (
Equation A2).
Using
Equation A1, we can derive the index of detectability for the combine cues,
d′
y, given the mean internal responses to the signal and noise only <
ys>, <
yn>, and its standard deviation, σ
y. We can calculate the expected value of the signal and noise distributions of y using
Equations A3 and
A4.
We assume that <
xcue,
n> = 0, therefore <
yn> = 0. For the statistically independent cues with unit variance, the optimal weighting is known to be
wcue =
d′
cue.(proportional to the information provided by those cues represented by the index of sensitivity for that cue (Green & Swets,
1966). Again, assuming unit variance, <
xcue,
s> =
d′
cue. Replacing
wcue and <
xcue,
s> in
Equation A4. results in
Finally, the standard deviation of
y is derived using error propagation and partial derivatives in
Equation A6 and solved in
A7, noting that we assume unit variance.
Returning to
Equation A1, using
Equations A3,
A4, and
Equation A7, we can derive the optimal linear additive combination of
d′,
Equation A8 is the known relation between the index of detectability of multiple combined statistically independent cues and the individual
d′s of each cue. The
d′ of each cue is the contribution of that cue to target detectability. In our experiment, we measure the
d′ associated with scenes with a specific cue (object co-occurrence, multiple object configuration, or background category) and the presence of the target. To isolate the contribution of the cue to the target detectability, we assume that the presence of a target is another statistically independent source of information that is combined with the presence of a contextual cue. Thus, target detectability (
d′) in a scene with the target present and a contextual cue (
d′
cue,
none) is given by the combination of the
d′ in a scene with only the target and no contextual cue (
d′
none) and the
d′ contribution of the cue (
d′
cue):
Both
d′
cue,none (target detectability from scenes with one cue and the target) and
d′
none (target detectability from scenes with the target but no cues) are obtained from the experiments and the contribution to
d′ from the contextual cue (
d′
cue) can be estimated by solving
Equation A9:
A
d′
cue can be estimated for each contextual cue from
Equation A10. Predictions from the combination of multiple cues can be then calculated for each of the combined cue experimental conditions (OM, OB, MB, and OMB) including the contribution of the target (
d′
none).
Equation A11 shows the prediction for the scenes with all cues. Similarly, one can predict the effective
d′ from the presence of two cues.