Abstract
A central goal in vision science is to understand what features are used by our visual system to process visual scenes. Reverse correlation methods are often used for uncovering the internal representations supporting recognition. This involves analyzing the statistical relationship between participants' decisions and visual perturbations applied to different visual locations across many trials. While these and other methods have successfully identified visual features diagnostic for the recognition of faces and other synthetic object stimuli, they require thousands of trials per subject and typically fail when applied to object classes with greater variability in their appearance. To rectify this, we introduce Clicktionary, a web-based game for assessing the importance of visual features for object recognition. Pairs of participants play together to identify objects: One player reveals image regions diagnostic of the object's category while the other tries to recognize the object as quickly as possible. Aggregating game-play data across players yields importance maps for individual object images, in which each pixel is scored by its contribution to enabling correct object recognition. These importance maps reveal object features that are distinct from those emphasized by saliency algorithms or those used by current state-of-the-art deep convolutional networks (DCNs) that are starting to approach human-level accuracy on certain visual recognition tasks. At the same time, we have found that the accuracy of DCNs can be further improved by "cueing" these networks to attend to visually relevant image areas during learning. Clicktionary is a novel method which, for the first time, enables the visualization of feature importance in object classes with high intra-class variability, opening up new research avenues to study the bases of mid- and high-level vision.
Meeting abstract presented at VSS 2017