Journal of Vision Cover Image for Volume 24, Issue 10
September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Lightness Illusions Through AI Eyes: Assessing ConvNet and ViT Concordance with Human Perception
Author Affiliations
  • Jaykishan Patel
    York University
    Center for Vision Research
  • Alban Flachot
    York University
    Center for Vision Research
  • Javier Vazquez-Corral
    Universitat Autònoma de Barcelona (UAB)
    Computer Vision Center, UAB
  • Konstantinos George Derpanis
    York University
    Lassonde School of Engineering
  • Richard Murray
    York University
    Center for Vision Research
Journal of Vision September 2024, Vol.24, 1245. doi:https://doi.org/10.1167/jov.24.10.1245
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Jaykishan Patel, Alban Flachot, Javier Vazquez-Corral, Konstantinos George Derpanis, Richard Murray; Lightness Illusions Through AI Eyes: Assessing ConvNet and ViT Concordance with Human Perception. Journal of Vision 2024;24(10):1245. https://doi.org/10.1167/jov.24.10.1245.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Inferring surface reflectance from luminance images has proven to be a challenge for models of human vision, as many combinations of illumination, reflectance, and 3D shape can create the same luminance image. Traditional models struggle with this deep ambiguity. Recently, convolutional neural networks (CNNs) and vision transformers (ViTs) have been successful computer vision approaches to inferring surface colour. These architectures have the potential to be foundational models for lightness and color perception, if they process image information similarly to humans. We trained CNN and ViT backbones including ResNet18, VGG19, DPT, and custom designs to infer surface reflectance from luminance images using a custom dataset of luminance and reflectance images generated in Blender. We used these models to infer surface reflectance from several well-known images that generate strong lightness illusions, including the argyle, Koffka-Adelson, snake, simultaneous contrast, White's, and checkerboard assimilation illusions, as well as their control images. These illusions are often thought to result from the visual system's attempt to infer surface reflectance from ambiguous images using the statistics of natural images, and we hypothesized that networks trained on simple scenes rendered with shading and shadows would be susceptible to similar illusions. We found that all networks did in fact predict illusions in most test images, and predicted stronger illusions than in the control conditions. The exceptions were that the models typically failed to predict the argyle illusion, and to predict assimilation illusions. Model saliency analysis showed that the networks' outputs were strongly dependent on pixel-information in the shadowed regions of the image. These results support the hypothesis that some lightness phenomena arise from the visual system's use of natural scene statistics to infer reflectance from ambiguous images, and show the potential of CNNs and other deep learning architectures as starting points for models of human lightness and colour perception.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×