Abstract
Humans are remarkably adept at seeing in ways that go well beyond pattern classification. We represent bounded objects and their shapes from visual input, and also extract meaningful relations among object parts and among objects. It remains unclear what representations are deployed to achieve these feats of relation processing in vision. Can human perception of relations be best emulated by applying deep learning models to massive numbers of problems, or should learning instead focus on acquiring structural representations, coupled with the ability to compute similarities based on such representations? To address this question, we will present two modeling projects, one on abstract relations in shape perception, and one on visual analogy based on part-whole relations. In both projects we compare human performance to predictions derived from various deep learning models and from models based on structural representations. We argue that structural representations at an abstract level play an essential role in facilitating relation perception in vision.