September 2018
Volume 18, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2018
A causal model of recursive scene parsing in human perception
Author Affiliations
  • Ning Tang
    Department of Psychology and Behavior Science, Zhejiang University
  • Haokui Xu
    Department of Psychology and Behavior Science, Zhejiang University
  • Jifan Zhou
    Department of Psychology and Behavior Science, Zhejiang University
  • Rende Shui
    Department of Psychology and Behavior Science, Zhejiang University
  • Mowei Shen
    Department of Psychology and Behavior Science, Zhejiang University
  • Tao Gao
    Departments of Statistics and Communication Studies, UCLA
Journal of Vision September 2018, Vol.18, 750. doi:https://doi.org/10.1167/18.10.750
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ning Tang, Haokui Xu, Jifan Zhou, Rende Shui, Mowei Shen, Tao Gao; A causal model of recursive scene parsing in human perception. Journal of Vision 2018;18(10):750. https://doi.org/10.1167/18.10.750.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

A visual scene can be recursively parsed into parts and sub-parts. We present a causal model of scene parsing that can synthesize and identify parse trees, predict perceptual complexity, and pass a (limited) Turing test. It describes the causal process of generating a scene as splitting it recursively with vertical and horizontal cuts, modulated by two parameters: (a) Splitting Factor (SF) – a large SF favors splitting a part into more sub-parts, generating a wide and shallow tree; (b) Part Similarity (PS) – a large PS favors evenly splitting a part into sub-parts. Given a scene, it infers the best parsing tree by evaluating the number of parts and their similarities at each partition. It inspires three human experiments beyond RT/accuracy measurements. (1) "Just cut it". Participants freely cut a blank scene into 6 rectangles recursively. With human-generated images, the model removed all free parameters by estimating human SF and PS priors. (2) "Complexity comparison". Some scenes are immediately perceived as more complex than others. Scene complexity can be quantified as "information content", determined by the probability of generating that scene (higher probability carries less information). Participants ranked the complexities of 20 scenes via paired comparisons. The model ranked the same images by computing their information contents. The result revealed a strong correlation between human and model rankings (r2 = 0.85). (3) Turing test. Each scene can be interpreted by multiple parsing trees. Participants viewed a scene and one parsing tree, then reported whether the tree was generated by a human or machine. Two baseline models were introduced: (a) the causal model with non-informative SF and PS priors; (b) a model uniformly sampling a tree from valid ones. Only the causal model with human priors passed Turing test. These results demonstrate how to formalize human scene parsing with a causal model.

Meeting abstract presented at VSS 2018

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×