The inspection of a visual object is accompanied by complex gaze behavior that has proven difficult to predict (Jovancevic, Sullivan, & Hayhoe,
2006; Turano, Geruschat, & Baker,
2003). Aspects of this gaze behavior are driven by stimulus attributes, but task demands can also drastically influence gaze behavior (Navalpakkam & Itti,
2005). Stimulus attributes include low-level features such as brightness, contrast, color, or spatial frequency, but may also include high-level information such as depth cues. The perception of depth can have the effect of over-riding some of the salient 2-D cues. For example, the addition of perspective information during the viewing of a simple geometric capsule has the effect of biasing gaze towards the center of gravity for the object in 3-D, instead of the center of the 2-D projected form (Vishwanath & Kowler,
2004). In more complex tasks such as face perception, there is evidence that the addition of stereopsis can influence gaze behavior (Chelnokova & Laeng,
2011), but it is difficult to determine whether depth
per se changed eye movement behavior or depth simplified the task, resulting in a change in inspection strategy.
The addition of depth information can influence complex object recognition. For example, face recognition improves with the addition of binocular disparity information (Liu & Ward,
2006). We have shown previously that a number of depths cues, such as structure-from-motion (SFM) and texture (Farivar et al.,
2009; Liu et al.,
2005), can alone serve to empower complex object recognition. Interestingly, not all depth cues are processed by similar cortical mechanisms. For example, stereopsis and SFM are processed largely by the dorsal visual pathway (Backus, Fleet, Parker, & Heeger,
2001; Heeger, Boynton, Demb, Seidemann, & Newsome,
1999; Kamitani & Tong,
2006; Tootell et al.,
1995; Tsao et al.,
2003; Zeki et al.,
1991), a pathway that begins in V1 and courses dorsally through area MT before terminating in the posterior parietal lobe, while texture and shading largely depend on the ventral visual pathway (Merigan,
2000; Merigan & Pham,
1998), a pathway that also begins in V1, but courses ventrally through V4 and terminates in the anterior temporal lobe.
In displays where a single depth cue is used to render an object, low-level cues such as luminance and contrast are unavailable, weak, or unrelated to the surface of the object. This means that saliency models such as described by (Itti & Koch,
2000) would poorly predict gaze behavior during 3-D object recognition, independent of task demands, simply because low-level features in such displays are unrelated to depth. Therefore, gaze behavior in such displays must be driven primarily by the perception of depth and not the manner in which depth is rendered—the depth cue ought to be unimportant. We call this the depth-cue invariance hypothesis—that gaze behavior is driven by the depth profile of an object, independent of low-level features or task demands. The key prediction of this view is that the gaze behavior for a given object recognition task would remain constant across different depth cues. An alternative hypothesis is that gaze performance is not invariant to depth cue, but tasks using each depth cue create different demands for gaze, resulting in a distinct pattern of gaze behavior that depends on the depth cue of the stimulus.
Distinguishing between these two hypotheses is challenging, because it requires comparison of gaze performance across different depth cues in a manner that separates the influence of task demands. While objects can be defined by different depth cues, recognition performance across depth cues is not consistent. For example, face recognition performance with SFM on a 1:8 matching task is approximately 50% (four times above chance), while the same task with shaded faces results in performance close to 85% (Farivar et al.,
2009). This means that we cannot directly compare gaze performance across depth for such a task—the gaze performance difference could be due to the difference in the task difficulty, independent of the depth cue itself (Dehmoobadsharifabadi & Farivar,
2016).
In the studies described below, we have sought to overcome this challenge and measure gaze behavior during face recognition, where faces are defined by different depth cues. We were able to equate task difficulty across the different depth cues by measuring face identification thresholds—the amount of morph between a given facial surface and the “average” facial surface needed to result in correct performance. This allowed us to equate task difficulty across depth cues and therefore directly compare how gaze performance was influenced by depth cues for similar levels of performance on each depth cue.