August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Toward A Computational Model of Directional Visual Relations
Author Affiliations & Notes
  • Pachaya Sailamul
    Brown University
  • Thomas Serre
    Brown University
  • Footnotes
    Acknowledgements  ONR (N00014-19-1-2029) and NSF CRCNS US-France Research grant (IIS-1912280)
Journal of Vision August 2023, Vol.23, 5369. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Pachaya Sailamul, Thomas Serre; Toward A Computational Model of Directional Visual Relations. Journal of Vision 2023;23(9):5369.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Deep convolutional networks (DCNs) have been shown to match human visual ability in various tasks including object classification and segmentation. Nevertheless, DCNs still struggle to match our ability for abstract visual reasoning. Visual relations can broadly be divided into categorical and directional relations. Previous work has investigated the ability of DCNs to solve categorical visual reasoning (CVR) problems. For instance, it has been found that DCNs can solve spatial reasoning tasks such as determining whether objects are vertically or horizontally arranged much more efficiently than same-difference tasks such as determining whether two objects are same or different. Here, we explore another class of visual reasoning problems known as directional visual relations (DVR), where the order of each object in a relation matters. For instance, a visual scene of “a baby on a blanket” would be different from a scene of “a blanket on a baby.” We hypothesized that attention and working memory are needed to solve these tasks and because these functions are lacking in DCNs, DCNs would be limited in their ability to learn to solve these tasks. First, we studied how DCNs learn to solve DVR tasks, judging whether a target object is to the left vs. right and to the bottom vs. top of a reference object. DCNs struggle to learn directional visual relations when stimulus variability makes rote memorization difficult. Extending a DCN architecture to incorporate attention and memory yield a model that solves the task on par with human judgments. Altogether, our findings suggest that feedforward processing alone is insufficient to solve DVR and that attention and working memory are crucial for modeling how the brain solves DVR.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.