Abstract
The performance of computer algorithms that detect frontal face images has increased dramatically. However, numerous edge cases still exist where computer performance suffers compared to human ability; humans, able to detect faces from birth, perform at what can be considered an asymptotic level. We previously investigated the divergence between human and computer performance on a task that required detecting faces blended with phase-scrambled noise. Human performance greatly exceeded algorithmic performance on this task, but the stimuli were somewhat unrealistic, and thus a suboptimal benchmark for computer algorithms. Unlike phase-scrambled noise, occlusion is a common feature of natural scenes. Thus, in the present work we investigated the ability of humans and computers to detect heavily occluded faces. In the condition that produced the lowest human accuracy (faces composited over generated Portilla-Simoncelli textures with large, black occluding bars) subjects were able to achieve accuracies well above chance (57.6% mean accuracy on a 3-AFC task, N=409), and greatly exceeded the performance of the algorithms tested; successful detections by algorithms were near zero at levels of visibility where human performance approached ceiling. We investigated the nature of the difference between a range of computer algorithms and humans by using human performance information to gauge and scaffold both algorithm performance and the generalizability of trained classifiers. In typical machine learning training paradigms the training set is labeled only with a binary class identifier, but in our approach we integrate item-level accuracy, response time, and computed difficulty. This strategy of applying rich human performance data to the training and evaluation of algorithms points to promising techniques for increasing the performance and biological plausibility of face detection, face processing and other computer vision algorithms.
Meeting abstract presented at VSS 2013