The target of another person's gaze is a strong cue for where that person is directing his or her visual attention, and therefore what may be on his or her mind moment to moment (Pärnamets, Johoansson, Hall, Balkenius, & Spivey,
2015). Additionally, because people (and other animals) tend to direct their visual attention to the informative and behaviorally relevant areas of the environment (Mackworth & Morandi,
1967), the ability to infer another's attention (via gaze, as a proxy) also helps to reveal the important things that may be happening in a person's immediate vicinity (Byrne & Whiten,
1991).
The direction of another person's eye fixation is a robust and precise cue for tracking gaze (and therefore, attention), and it is therefore unsurprising that the human visual system has evolved to process this social signal with remarkable accuracy and efficiency (Cline,
1967; Gale & Monk,
2000; Symons, Lee, Cedrone, & Nishimura,
2004; Bock, Dicke, & Thier,
2008). However, no perceptual signal is perfectly noiseless in its extraction and unambiguous in its interpretation. As such, secondary cues like head position (Wallaston,
1824; Ken,
1990; Langton,
2000) or even facial expression (Martin & Rovira,
1982; Lobmaier, Tiddeman, & Perrett,
2008) concurrently inform the judgment of where another person is looking.
But additionally, if one had reliable intuitions about where in the visual scene another person would be likely to direct his or her gaze—a priori of extracting the signal from his or her eyes—then this contextual information could potentially be integrated with the eye cue to improve the inference of gaze direction. Past experiments have indeed demonstrated the influence of context on human gaze perception, with people showing a bias that another person's gaze is directed toward them (Ken,
1990; Mareschal, Calder, & Clifford,
2013) or at objects (Lobmaier, Fischer, & Schwaninger,
2006; Wiese, Zwickel, & Müller,
2013). Each of these individual empirical findings make sense given basic intuitions about human nature—that is, objects and faces would naturally be regions of interest in a counterpart's visual scene (Yarbus,
1967), and even the most mundane face is surely more interesting than, say, the empty space immediately to the left and right of it.
But in turn, it should be clear that all of the locations in the counterpart's visual environment (including one's own face) are salient to varying degrees—that is, a priori more or less likely to capture the other person's visual attention. We appeal to the more general case, and predict that prior considerations with respect to presumed visual salience should systematically factor into human gaze perception. This basic approach—combining perceptual cues from the target person's eyes (or head position, etc.) with the visual salience of the scene—has been exploited to improve the accuracy of computer vision algorithms in both the discrimination of gaze direction (Hoffman, Grimes, Shon, & Rao,
2006; Yücel et al.,
2013) and in the related task of identifying where another person is pointing (Schauerte, Richarz, & Fink,
2010). We here test whether human gaze perception employs a similar mechanism, asking whether the performance of a model like this would be consistent with an observer's judgments of the most likely target of another individual's gaze (regardless of whether the observer's judgment is correct with respect to ground truth).
Our experimental subjects view photographs of a young man gazing at various locations on a partially transparent surface situated between him and the camera. The experimental task is to indicate where on this surface this “gazer” is looking, a task that we define computationally as the inference of the location [
x,
y] within the continuous two-dimensional (2-D) plane where the photographed individual is gazing (
Gx,y) given the gaze directional cue from the eyes of the person (
D) and the image presented in that plane (
I). Bayes' rule yields the posterior probability distribution, continuous over the 2-D hypothesis space:
In our treatment, the prior—p(Gx,y)—is equivalent to the relative visual salience of location [x, y] within image I, where salience is some model of where people are a priori likely to direct their visual attention and fixation. This study explores whether a Bayesian model that incorporates a visual salience map as a prior can account for actual human subjects' gaze judgments better than a model that ignores this information, and uses only the eye cues.