Abstract
Predictive models of eye movements often do not address population differences. Different tasks may play an important role in differentiating eye movements among discrete groups. For example, eye movement behavior is known to vary by gender for an emotion-perception task (Vassallo, 2009). We explore observed differences in eye movements between genders by eye-tracking subjects during a audio-visual listening task, as compared to a free-viewing task.
Thirty-four subjects, balanced by gender, are eye-tracked while watching eighty-five videos of different people who give answers to conversational questions. Videos are filmed outdoors with a natural background of distractors, such as pedestrians and vehicles. After viewing each clip, subjects answer questions about the video to measure any attentional differences. To control for task effects, a separate group of ten control subjects are asked to free view the clips. Interestingly, the main sequence of collected saccades significantly differs across gender (n=33806, peak velocity: p<1e-15, amplitude: p=0.0076). Saccade sequences are scored by examining the values of the saliency model output of the corresponding video (Itti, 2004) at saccade endpoints. Correlation to saliency is measured by comparing saccade scores to randomly sampled saliency scores with an AUC (area under the curve) metric. Saccades are also scored for their correlation to the component features of saliency (color, orientation, intensity, flicker, and motion) in a similar manner. We also find that correlations to saliency are significantly greater for male viewers over female (p<1e-143) and are also significantly greater for female speakers (p<1e-143). Furthermore, there is a two-way interaction on saliency correlations between the gender of the viewer and speaker. (2-way ANOVA, df=1,F=15123.48). Gender differences persist across all features, suggesting a broad gender difference in attentional allocation during listening. We also investigate the interplay of gender and saliency with fixations to the viewer's eyes, face, and background objects.