Abstract
Decades of research on how people represent visual categories have yielded a variety of formal theories, but validating them with naturalistic visual stimuli such as objects and scenes remains a challenge. The key problems are that human visual category representations cannot be directly observed and designing informative experiments using rich visual stimuli such as photographs requires having a reasonable representation of these images. Deep neural networks have recently been successful in a range of computer vision tasks and provide a way to represent the features of images. Here we outline a method for estimating the structure of human visual categories that draws on ideas from both cognitive science and machine learning, blending human-based algorithms with state-of-the-art deep representation learners. We provide qualitative and quantitative results as a proof of concept for the feasibility of the method. Samples drawn from human distributions rival the quality of current state-of-the-art generative models and outperform alternative methods for estimating the structure of human categories.
Meeting abstract presented at VSS 2018