Abstract
Human perception of images goes far beyond objects, shapes, textures and contours. Viewing a scene often elicits recollection of other scenes whose global properties or relations resemble the currently observed one. This relies on a rich representation in image space in the brain, entailing scene structure and semantics, as well as a mechanism to use the representation of an observed scene to recollect similar ones from the profusion of those stored in memory. The recent explosion in the performance and applicability of deep-learning models in all fields of computer vision, including image retrieval and comparison, can tempt one to conclude that the representational power of such methods approaches that of humans. We aim to explore this by testing how deep neural networks fare on the challenge of similarity judgement between pairs of images from a new dataset, dubbed "Totally-Looks-Like". It is based on images from a website in popular media, which hosts pairs of images deemed by users to appear similar to each other, though they often share little common appearance, if judging by low-level visual features. These include pairs of images out of (but not limited to) objects, scenes, patterns, animals, and faces across various modalities (sketch, cartoon, natural images). The website also includes user ratings, showing the level of agreement with the proposed resemblances. The dataset is very diverse and implicitly represents many aspects of human perception of image similarity. We evaluate the performance of several state-of-the-art models on this dataset, comparing their performance with human similarity judgements. The comparison not only forms a benchmark for other similar evaluations, but also reveals specific weaknesses in the strongest of the current systems that point the way for future research.
Meeting abstract presented at VSS 2018