August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Dynamic graph convolutional networks do not recognize global 3D shapes
Author Affiliations & Notes
  • Shuhao Fu
    University of California, Los Angeles
  • Zhiqi Zhang
    University of California, Los Angeles
  • Philip Kellman
    University of California, Los Angeles
  • Hongjing Lu
    University of California, Los Angeles
  • Footnotes
    Acknowledgements  This project was funded by NSF BCS 2142269 awarded to HL.
Journal of Vision August 2023, Vol.23, 5017. doi:https://doi.org/10.1167/jov.23.9.5017
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Shuhao Fu, Zhiqi Zhang, Philip Kellman, Hongjing Lu; Dynamic graph convolutional networks do not recognize global 3D shapes. Journal of Vision 2023;23(9):5017. https://doi.org/10.1167/jov.23.9.5017.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Deep learning models can be trained to recognize 3D objects from a point cloud, i.e., a discrete set of points randomly sampled on the surfaces of 3D objects. Dynamic Graph Convolutional Neural Network (DGCNN) takes inputs of 3D coordinates for 1024 points and reaches human-level recognition performance. DGCNN is trained to project 3D coordinates of each point to a high dimensional space (256 dimensions) of geometric features, and then makes recognition decisions based on these features. However, it remains unclear what geometric features DGCNN extracts to support object recognition, and whether DGCNN uses similar 3D shape representations as humans. We used an activation maximization method to identify the preferred input point cloud pattern that maximally activates each neuron in DGCNN. We found that lower-level layers learn local geometric features in small regions (e.g., corners with different curvatures), while higher-level layers pick up more complex patterns in larger regions (e.g., surfaces with different curvatures, parallel segments, elongated segments). We next examined the robustness of humans and DGCNN in 3D object recognition. Human participants were asked to classify ten different common objects shown as a point cloud rotated in depth. Point cloud displays either included 1024 points (100%) or a down-sampled cloud with a smaller number of points (20%, 30%, etc.). For most objects (e.g., airplanes, chairs), human performance was robust even using only 20% of points. In contrast, DGCNN showed much weaker recognition performance when less than 60% of points were included, dropping to chance-level performance with 20% of points. These results imply that humans primarily use global shapes in 3D object recognition, whereas DGCNN relies on local geometric features. Thus DGCNN (like a standard CNN) learns local geometric properties instead of global shapes of objects and is therefore vulnerable to adversarial attacks with minor alterations of local geometry.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×