September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Reading minds in the eyes with GPT4-vision
Author Affiliations & Notes
  • Scott Murray
    University of Washington
  • Geoffrey Boynton
    University of Washington
  • Footnotes
    Acknowledgements  NIH R01MH131595
Journal of Vision September 2024, Vol.24, 1074. doi:https://doi.org/10.1167/jov.24.10.1074
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Scott Murray, Geoffrey Boynton; Reading minds in the eyes with GPT4-vision. Journal of Vision 2024;24(10):1074. https://doi.org/10.1167/jov.24.10.1074.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

This study investigates the capabilities of GPT-4, an advanced language model with integrated vision capabilities, in interpreting complex mental states using the Reading the Mind in the Eyes Test (RMET). The RMET involves identifying subtle emotional and mental states from photographs of the region immediately around the human eyes. It is comprised of 36 photographs along with four descriptors of the person’s mental state. Like in human studies, we prompted GPT4 (API = gpt-4-vision-preview) to “Choose which word best describes what the person in the picture is thinking or feeling. You may feel that more than one word is applicable, but please choose just one word, the word which you consider to be most suitable. Your 4 choices are: …” We conducted five iterations of the RMET. GPT-4 produced an average of 25.4 items out of 36 correct (SD = 0.89), aligning closely with 'typical' general population human performance range (~25 – 26 items correct). Notably, inverting the images led to a 30% decrease in performance, which is less than the 50% decrease seen in humans, revealing a reliance on global, wholistic processing. Block scrambling the images (2 x 5 grid format), which preserved eye-size features but renders the images nearly unrecognizable to human observers, had almost no impact on GPT-4's performance (24 items correct). This surprising finding suggests that GPT-4's analysis of visual information may prioritize local features (eye gaze, eyebrow characteristics, etc.), over more global aspects of the image. These results provide insight into understanding an AI's visual processing mechanisms, indicating an interplay of feature-specific and holistic image analysis. Overall, the findings show that GPT-4 demonstrates a significant level of competence in recognizing a range of mental states, indicating its potential in applications requiring sophisticated emotional and cognitive understanding.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×