September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
Features derived from a deep neural network distinguish visual cues used by CCTV experts versus novices
Author Affiliations & Notes
  • Yujia Peng
    Department of Psychology, University of California, Los Angeles
  • Joseph Burling
    Department of Psychology, Ohio State University
  • Greta Todorova
    School of Psychology, University of Glasgow
  • Hongjing Lu
    Department of Psychology, University of California, Los Angeles
    Department of Statistics, University of California
  • Frank Pollick
    School of Psychology, University of Glasgow
  • Footnotes
    Acknowledgements  This research was funded by the Human Dimension and Medical Research Domain of the Dstl Programme Office for the UK Ministry of Defence, and NSF grant BCS-1655300
Journal of Vision September 2021, Vol.21, 1965. doi:https://doi.org/10.1167/jov.21.9.1965
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yujia Peng, Joseph Burling, Greta Todorova, Hongjing Lu, Frank Pollick; Features derived from a deep neural network distinguish visual cues used by CCTV experts versus novices. Journal of Vision 2021;21(9):1965. https://doi.org/10.1167/jov.21.9.1965.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

When viewing actions, we not only recognize patterns of body movements, but we also "see" the intentions and social relations of people. However, there are individual differences in the ability of inferring intentions from action observations. Experienced forensic examiners, Closed Circuit Television (CCTV) operators, show superior performance to novices in predicting and identifying hostile intentions in complex scenes. However, it remains unknown what visual features CCTV operators actively attend to when viewing surveillance footages, and whether attended features differ between experts and novices. In this study, we analyzed visual contents in image patches centered on the gaze fixations of CCTV operators and of novices when they viewed the same surveillance footage of activity preceding harmful interactions and control conditions. First, a visual saliency model was used to examine the impact of salient image features (e.g., luminance, color, motion) on guiding gaze fixations in viewing dynamic scenes. We did not find a group difference between experts and novices in terms of visual saliency of attended stimuli. We then employed a deep convolutional neural network (DCNN) model to extract DCNN features from the penultimate layer of the network. Through machine learning classifiers, DCNN features from the gaze-centered input can distinguish experts from novices, specifically for surveillance videos that precede harmful interactions (e.g., fighting). DCNN features also showed greater inter-subject correlations among CCTV operators than for novices. The results suggest that experts such as CCTV operators are more efficient in attending to object-relevant information that likely is associated with social context and relationships between agents. Such differences in actively attending to high-level visual information enable the experts to better predict harmful intentions in human activities.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×