December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
What we don’t see in image databases
Author Affiliations
  • Michelle R. Greene
    Bates College
  • Jennifer A. Hart
    Bates College
  • Amina Mohamed
    Bates College
Journal of Vision December 2022, Vol.22, 3204. doi:https://doi.org/10.1167/jov.22.14.3204
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Michelle R. Greene, Jennifer A. Hart, Amina Mohamed; What we don’t see in image databases. Journal of Vision 2022;22(14):3204. https://doi.org/10.1167/jov.22.14.3204.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The rise of large-scale image databases has accelerated productivity in both human and machine vision communities. Most extant databases were created in three phases: (1) Obtaining a comprehensive list of categories to sample; (2) Scraping images from the web; (3) Verifying category labels through crowdsourcing. Subtle biases can arise in each stage: offensive labels can get reified as categories; images represent what is typical of the internet, rather than what is typical of daily experience, and verification is dependent on the knowledge and cultural competence of the annotators that provide “ground truth” labels. Here, we describe two studies that examine the bias in extant visual databases and the deep neural networks trained from them. 66 observers took part in an experience sampling experiment via text message. Each received 10 messages per day at random intervals for 30 days, and sent a picture of their surroundings if possible (N=6280 images). Category predictions were obtained from CNNs pretrained on the Places database. The dCNNs showed poor classification performance for these images. A second study investigated cultural biases. We scraped images of private homes from Airbnb from 219 countries. Pre-trained deep neural networks were less accurate and less confident in recognizing images from the Global South. We observed significant correlations between dCNN confidence and GDP per capita (r=0.30) and literacy rate (r=0.29). These studies show a dissociation between lived visual content and web-based content, and suggest caution when using the internet as a proxy for visual experience.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×