Abstract
Introduction: When identifying a person from a video of a single individual, humans rely on face but also body information (e.g., Hahn et al., 2015). Here, we investigate search for a person in cluttered outdoor videos containing other distractor people. We investigated the eye movements and contributions to the performance by different features: bodies, heads, and faces. Critically, we ask whether the human utilization of features reflects differences in discriminative information content across features or a strategy particular to humans. Methods: Sixty observers searched for two people (person 1, person 2, none) among distracting people in cluttered outdoor videos (6 sec.) in four conditions: intact, headless, faceless, and bodiless people. To objectively quantify the contributions to person identification by different features we utilized a foveated ideal observer (FIO; Peterson & Eckstein, 2012) and a deep neural network (DNN; He et al., 2016). FIO and DNN identification performance were obtained for still images from the videos. The DNN performance was also obtained for data that was pre-processed with a foveated system fixated on faces (DNN-FOV). Results: When the head was present in the videos human fixations heavily concentrated on the face. Observer identification performance degraded the most when removing faces or heads (intact = 0.78 ± 0.05 vs. faceless = 0.55 ± 0.07; headless = 0.52 ± 0.06) but much less without the bodies (0.73 ± 0.05). In contrast, FIO, DNN and DNN-FOV performance degraded as much or more for the bodiless condition. In a separate experiment, forcing humans to fixate away from the faces degraded accuracy. Conclusion: Eye movements to faces during person search serve to optimize accuracy. The face-centered human strategy is not related to higher information content in faces but likely arises as a byproduct of the importance of faces for other evolutionary critical tasks.
Meeting abstract presented at VSS 2018