September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Hole-in-the-wall: Perception of 3D shape and affordances from static images in humans and machines
Author Affiliations & Notes
  • Thomas S Wallis
    Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen
    Bernstein Center for Computational Neuroscience, Tübingen
  • Marlene Weller
    Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen
  • Christina M Funke
    Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen
    IMPRS-IS Graduate School
  • Matthias Bethge
    Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen
    Bernstein Center for Computational Neuroscience, Tübingen
    Tuebingen AI Center
    Institute for Theoretical Physics, Eberhard Karls Universität Tübingen
Journal of Vision September 2019, Vol.19, 160b. doi:https://doi.org/10.1167/19.10.160b
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Thomas S Wallis, Marlene Weller, Christina M Funke, Matthias Bethge; Hole-in-the-wall: Perception of 3D shape and affordances from static images in humans and machines. Journal of Vision 2019;19(10):160b. doi: https://doi.org/10.1167/19.10.160b.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

One popular toy for toddlers involves sorting block shapes into their respective holes. While toddlers require trial-and-error actions to sort blocks correctly, adults can rapidly see the appropriate solution through visual inspection alone. This feat requires an understanding of 3D shape and mental rotation. We study this task in a simplified vision-only setting by generating “shapes” of varying complexity using square matrices filled with connected binary regions, and “holes” by taking the negative region. “Fits” and “doesn’t fit” conditions are created while ensuring that shapes do not match exactly and that the total filled area is the same in both conditions. These matrices are rendered into black-and-white images (“bw”) and into more realistic rendered scenes. Human observers performed a single-interval fits / doesn’t fit task for two complexity levels for bw and rendered scenes. Performance was high for both bw (average d’ high complexity = 2.6, low complexity = 3.1) and rendered scenes (d’ high = 2.7, low = 3.1), showing that indeed humans can perform this task well. To assess whether current machine vision systems can learn this task, we finetuned the weights of a convolutional neural network (CNN; ResNet-50) on 250k bw images at four complexity levels. The network achieved a (test) accuracy of 94% (same complexity) and 87% (generalisation on higher complexity). For the same images seen by humans, the network performs better than humans at both complexity levels (d’ high = 3.5, low = 4.4), but there was no correlation between human response time and network logit. This suggests that the network is solving the task in a non-humanlike way. While this CNN can learn to exceed human performance at this particular task, we expect the model to fail further tests of generalisation because it does not understand the physical properties of the hole-in-the-wall task.

Acknowledgement: Funded by the German Federal Ministry of Education and Research (BMBF) through the Bernstein Computational Neuroscience Program Tuebingen (FKZ: 01GQ1002), the German Excellency Initiative through the Centre for Integrative Neuroscience Tuebingen (EXC307), and the German Science Foundation (DFG priority program 1527, BE 3848/2-1 and SFB 1233, Robust Vision: Inference Principles and Neural Mechanisms, TP03). 
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×